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Preface 



Intelligent agents will be the necessity of the coming century. Software agents 
will pilot us through the vast sea of information, by communicating with other 
agents. A group of cooperating agents may accomplish a task which cannot be 
done by any subset of them. 

This volume consists of selected papers from PRIMA’99, the second Pacific 
Rim International Workshop on Multi- Agents, held in Kyoto, Japan, on Decem- 
ber 2-3, 1999. 

PRIMA constitutes a series of workshops on autonomous agents and multi- 
agent systems, integrating the activities in Asia and the Pacific rim countries, 
such as MACC (Multiagent Systems and Cooperative Computation) in Japan, 
and the Australian Workshop on Distributed Artificial Intelligence. The first 
workshop, PRIMA’98, was held in conjunction with PRICAP98, in Singapore. 

The aim of this workshop is to encourage activities in this field, and to bring 
together researchers from Asia and Pacific rim working on agents and multiagent 
issues. Unlike usual conferences, this workshop mainly discusses and explores 
scientific and practical problems as raised by the participants. Participation is 
thus limited to professionals who have made a significant contribution to the 
topics of the workshop. 

Topics of interest include, but are not limited to: 

- multi-agent systems and their applications 

- agent architecture and its applications 

- languages for describing (multi-) agent systems 

- standard (multi-) agent problems 

- challenging research issues in (multi-) agent systems 

- communication and dialogues 

- multi- agent learning 

- other issues on (multi-) agent systems 

We received 43 submissions to this workshop from more than 10 countries. 
Each paper was reviewed by at least two program committee (PC) members who 
are internationally renowned researchers. After careful consideration, 17 papers 
were selected for these proceedings. We would like to thank all the authors who 
submitted their papers to this workshop. We would also like to thank all the 
PC members for their quality work. Special thanks goes to the keynote speaker. 
Professor Michael Georgeff from the Australian Artificial Intelligence Institute, 
for his support. 

For more information about PRIMA, please check the following web pages: 

PRIMA Web page http://www.lab7.kuis.kyoto-u.ac.jp/prima/ 

PRIMA’99 Web page http://www.lab7.kuis.kyoto-u.ac.jp/prima99/ 
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This workshop is held in cooperation with: 

— lEICE (The Institute of Electronics, Information and Communication Engi- 
neers), Japan 

— ETL (COE Global Information Processing Project), MITI, Japan 

— MACC (Multi- Agent and Cooperative Computation), Japanese Society for 
Software Science and Technology 
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Chengqi Zhang 
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Abstract. In this paper, we propose a flexible multi- agent model, called 
Eield Reactor Model (ERM) for open system environment such as ubiqui- 
tous computing. ERM unifies indirect communication with an abstract 
medium and pattern-oriented message communication. The collabora- 
tion method among agents is pattern directed message collaboration that 
yields functional relations among patterns of agents. The pattern directed 
message collaboration enables agents on heterogeneous platforms to cre- 
ate collaboration each other and supports to change collaboration dy- 
namically. We describe how to apply the computation scheme originated 
from dataflow to the pattern directed message collaboration. Also, we 
show the flexibility of ERM with an example of file format translations. 



1 Introduction 

Wide spread information infrastructures such as the Internet and radio net- 
work are providing environment for ubiquitous computing [1] proposed by Mark 
Weiser in Xerox PARC. Ubiquitous computing is a concept that users can use 
computers anytime anywhere. 

In order to realize the ubiquitous computing, we need a technology that 
supports flexible distributed object collaborations. The technology should sup- 
port object collaborations for providing adequate services, even if environment 
around objects dynamically changes according to change of user environment. 
Furthermore, the technology should also support collaboration among objects 
on heterogeneous platforms because the environment of ubiquitous computing 
relies on platforms such as media and OS. 

However, current distributed object systems such as CORE A [2] and 
DCOM [3] are not enough to support the ubiquitous computing. The collab- 
oration between server and client objects in current distributed object systems 
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is strict. The basic concept of these systems is a server-client model. Program- 
mers have to use stubs of server objects to implement the client program codes. 
Therefore, they must specify the server objects before accessing them. In cur- 
rent distributed object systems, collaborations among objects have to be fixed 
when the client object is implemented. This also implies that the collaborations 
can not be changed dynamically. Moreover, the collaborations are impossible 
when server objects are inaccessible. Hence, to support fiexible object collabora- 
tions under the environment of the ubiquitous computing is difficult with these 
systems. 

We introduce a new fiexible multi- agent model, called Field Reactor Model 
(FRM), which unifies indirect communication with an abstract medium and 
pattern-oriented message communication. A field is an abstract medium, which 
is free of the physical media, and dispatches messages to all agents. Each agent 
sends messages into a field without explicitly specifying to its destination agents. 
Reactors are agents in the field, which have transformation rules for messages 
using pattern matching. All agents listen to all messages in the field, and re- 
act to the messages that satisfy their own message interpretation criteria. The 
collaboration method in FRM is pattern directed message collaboration that 
yields functional relations among patterns of agents. A combination of patterns 
of agents determines collaboration. The pattern directed message collaboration 
provides a method for multi-agent collaboration using patterns on heterogeneous 
platforms. 

Collaborations are fiexible in FRM. Agents can be added to and deleted 
from the field anytime independently of other agents. The relations of patterns 
of the agents determine agent’s collaboration. Thus, relations of patterns can 
dynamically change the collaboration according to change of the patterns. 

Furthermore, agents in FRM can collaborate over various media and hetero- 
geneous platforms. Several fields on various networks can be combined into one 
logical field. Therefore, a logical field enables agents to collaborate across various 
media and heterogeneous platforms. 

In this paper, we introduce FRM in section 2. Section 3 describes pattern 
directed message collaboration and the description method for multi-agent col- 
laboration using patterns. This section form the core of this paper, where we 
show how to apply the computation scheme originated from dataflow to the 
pattern directed message collaboration. In section 4, we show the flexibility of 
the FRM using an example. The example shows file format translation by the 
collaboration among agents on heterogeneous platforms and change of system 
behavior dynamically. A system of the example is developed and currently run- 
ning on LAN environment. 



2 Field Reactor Model 

FRM unifies indirect communication with an abstract medium and pattern ori- 
ented message passing among agents. Agents communicate with each other using 




Pattern Directed Message Collaboration in FRM 



3 



Reactor 




Fig. 1. An abstract diagram of Field Reactor Model. 1) A reactor sends a 
message into the field. 2) Reactors in the field receive the message from the 
field. 3) If the message matches a pattern in the reactor table, the function that 
is set with the pattern is invoked and sends a newly generated message into the 
field. 



the abstract medium and exchange messages according to the interpretation cri- 
teria of each agent. An abstract diagram of FRM is shown in figure 1. 

FRM consists of an abstract medium (called Field) and agents (called Reac- 
tors) residing in the field. A field is a basis for indirect communication and has 
only one feature; dispatching all messages to all reactors. A field is an abstract 
medium that is built over various media. We can consider multiple fields that 
connect to each other as one logical field. A reactor has its own matching table, 
called a reactor table. Reactors receive messages from the field and send mes- 
sages into the field based on their own interpretation criteria using their reactor 
table. The system behavior is determined by the chain of the reactions among 
reactors. 

2.1 Field 

A field is a logical smart multicast medium for communication among agents, 
and dispatches messages to all agents in the field. The field is independent of 
physical media and network protocols. 

We consider various physical media as fields. For example, a shared memory 
field provides a high speed and closed collaboration area inside of a computer. To 
create a shared memory field, we only create a management module for reactors, 
a dispatch box module and one event notification module on the system. Another 
example is a UDP broadcast field. The UDP broadcast field provides a sub- net 
communication area on a LAN at its network speed. To create a UDP broadcast 
field, we only bind a socket port and a broadcast address as a field. In this 
way, various types of fields are created on various physical media, and have the 
characteristic of physical media. 
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Fields on various physical media are integrated into one logical field. Gate- 
ways connect two fields, and transport messages from one field to another. For 
example, agents in a UDP broadcast field can send messages to and receive mes- 
sages from an agent in shared memory field through gateway, and vice versa. 
Also, when gateways transport messages from a field to the another, gateways 
filter the messages with their patterns. Therefore, a logical field supports collab- 
oration among agents over different physical media, network protocols, and on 
different platforms. 

Messages in a field are also independent of programming languages and plat- 
forms such as OS and ORBs. Messages might be string, XML or tuple. 

As mentioned above, agents across various physical media and platforms can 
collaborate with each other on a field through messages in the field. 



2.2 Reactor 

A reactor is an autonomous agent that receives messages from a field, interprets 
the messages, and sends a response message into the field. 

A reactor has a reactor table and a set of functions. The reactor table consists 
of multiple sets of matching patterns for messages, called message patterns, and 
their corresponding function. 

Each reactor behaves as follows: reactors in a field receive all messages from 
the field and check whether a message pattern matches one in their own reactor 
table. If a message pattern matches a pattern in the their reactor table, then the 
corresponding function is invoked. Otherwise the message is ignored. When a 
new message is generated in the invoked function, the new message is sent back 
into the field. 

The reactor table gives autonomy of the activation of reactors to messages. 
The reactor table connects between message patterns and functions. If several 
reactor tables combine the same pattern for a message set to different functions, 
the reactors will have different actions and send back different messages into 
the field when the reactors receive a message in the message pattern. Therefore, 
reactors respond to messages based on their own interpretation criteria. 

Additionally, a reactor can be added to or deleted from a field independently 
of other reactors, because the field isolates reactors. Reactors interact only with 
messages in the field. 

3 Pattern Directed Message Collaboration 

FRM provides flexible collaboration among reactors that changes the collabora- 
tion dynamically. A method of collaboration among reactors is a pattern directed 
message collaboration that yields functional relations between message patterns 
of reactors. In pattern directed message collaboration, relations of patterns are 
important. Patterns in FRM describe a set of corresponding messages and are 
independent of platforms and media. Patterns attempt to match target messages 
by; whether the message is included in the pattern. Note that patterns of FRM 
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allow to have relations each other such as inclusion. The relations of patterns is 
important for flexiblity of the systems. In this paper, we use a tuple of texts as 
a message. To represent a pattern we use the symbol which matches any 
tuple. For example, the pattern [a, b, *] matches tuples [a, b, c] and [a, b, d, e]. 
Also, we can make a pattern include or exclude a pattern by using the symbol 
For example, if there are two patterns [a, b, *] and [a, *], the pattern [a, *] 
includes the pattern [a, b, *]. 



3.1 Concept of Pattern Directed Message Collaboration 

The concept of pattern directed message collaboration is based on relations of 
message patterns. A message pattern is converted to another message pattern 
by a reactor. Sets of message patterns define collaboration among agents. Each 
pattern in the sets has relations each other. We introduce an input-pattern and 
an output-pattern to describe these relations distinctly. 

Reactors react to a message in the field by matching the message pattern 
and an entry in their reactor table. We define an input-pattern as the pattern in 
a reactor table. When a message matches an input-pattern, reactors invoke the 
function of the input-pattern. A new message is generated in the function. We 
call the pattern of the new generated message as an output-pattern. 

A reactor table implicitly connects the input-patterns and the output-pat- 
terns. The reactor table holds a set of input-patterns and corresponding func- 
tions, which generate its output-patterns. The relationship among reactors is 
determined as the relations between input-patterns and output-patterns. Input- 
patterns and output-patterns are held in each reactor. A new relationship among 
reactors is created, when new reactors are plugged in the field. 




Fig. 2. Example of pattern directed message collaboration 



Eor example, consider three reactors, and Rs that exist in a field. 

Reactor Ri has input-pattern R and output-pattern Oi. Reactor Ri reacts to 
messages matching pattern A, and sends messages having pattern Oi. We as- 
sume that input pattern R of reactor R2 includes output pattern Oi of reac- 
tor Ri , and the pattern Oi includes input-pattern R of Rs . The relations among 
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reactors are constructed as in figure 2. Reactors collaborate with each other ac- 
cording to the relationship between the input-patterns and the output-patterns. 
In this case, reactor R2 always reacts to output messages of reactor Ri, while 
reactor R3 reacts to them when the message is included in pattern . In the sys- 
tem, Ri always sends messages to R2. When output messages of Ri are included 
in pattern /2, Ri sends the messages to R2 and R3 simultaneously. 



3.2 Basic Design of Pattern Directed Message Collaboration 

In order to control collaboration among agents, we need descriptions to control 
collaboration flow. The control of collaboration is described using programming 
constructs; sequence, function definition, function invocation, loop, if-then-else 
and case. Also, we need parallel-fork and synchronous constructs, in order to 
control parallel processing. 

The pattern directed message collaboration provides a method to describe 
the control of collaboration among agents. We can describe the collaboration 
among agents by programming constructs. The programming constructs for col- 
laboration are shown in figure 3 . The principle underlying the program construct 
and its computation scheme is basically originated from dataflow computation 
scheme [ 4 ] except that execution control is performed as the message collabora- 
tion in a field. 




y 



(e) Case (f) Loop (g) Parallel-fork (h) Synchronous 

Fig. 3. Program constructs for collaboration in pattern directed message col- 
laboration 



a) Sequence 

When an input-pattern I2 of a reactor R2 includes an output-pattern 0 \ 
of another reactor R\ , R2 reacts to the output message of R\ . This means 
a sequential activation of reactors R\ and R2. By this sequence of reac- 
tors Ri and R2, a message in the input-pattern R of Ri is coveted to some 
message in the output-pattern O2 of R2. This mechanism is generalized to 
a sequence of reactors, Ri,R2, • ' ' which converts input-pattern R to 
output-patterns On- 



Pattern Directed Message Collaboration in FRM 



7 



b) Function definition 

We only define the interface of the function when defining a function. The in- 
terface of a function is an input-pattern and an output-pattern. The function- 
body is defined in a reactor. The reactor table of the reactor connects 
the input-pattern and the target function-body on various platforms. The 
function-body creates its output-pattern and sends its message into a field. 

c) Function invocation 

When invoking a function, a invoking reactor sends a message into the input- 
pattern of the function and creates an input-pattern in order to receive the 
output message from the invoked function. The invoking reactor holds a con- 
tinuation point for resuming its computation after returning the function, 
and the message from the invoked function includes pointers to continuation 
point. It is obvious that this mechanism include recursive function invoca- 
tion, if the function is defined as a recursive function. 

d) If-then-else 

The input-pattern / is condition of if-then-else. The reactor Rt executes 
then-part, and the reactor Re executes else-part. The input-pattern of Re is 
same as the output-pattern of i^, O, except the input-pattern I. If an output 
message from the reactor R is included in the input-pattern /, Rt reacts to 
the message, otherwise, the Re reacts the message 

e) Case 

The reactors Rei and Rc 2 react to an output message from the reactor R 
according to their conditions. The conditions for activation are input-pat- 
terns I\ and I 2 . When an output message from R matches the input-pat- 
tern reacts to the message, or when I 2 matches Rc 2 reacts. 

f) Loop 

When all reactors in a sequence are the same as one reactor i^, it is a loop 
computation. The reactor reacts to output-messages from itself. The loop 
computation continues while the output-message is included in the input- 
pattern of itself. The loop terminates when its output message is put outside 
the input-pattern. 

g) Parallel-fork 

If more than reactors have the same input-pattern, the reactors execute in 
parallel. The reactors react to a message in the input-pattern simultaneously. 

h) Synchronous 

Synchronous reactor is a reactor for synchronization the join of parallel reac- 
tors. The synchronous reactor controls synchronization using input-patterns 
to all parallel- forked reactors. The synchronous reactor waits output mes- 
sages from all parallel- forked reactors and sends synchronous message when 
all output messages have arrived. The synchronous reactor manages the out- 
put messages from the parallel- forked reactors. 

An example of a program of pattern directed message collaboration is shown 
in figure 4. The collaboration through patterns is shown in figure 4-(b). The 
semantics of the collaboration program is same as the program shown in fig- 
ure 4- (a). This collaboration-based computation scheme is quite similar to the 
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dataflow-based computation scheme. We use text tuples as messages in exam- 
ple. The input interface of the function / is the input-pattern [/, and 

the output interface of the function f is the output-pattern [/, *]• When the 

message [/, Req^Taski^param] appears in the field, the reactor Rgi reacts to the 
message and executes its task. Then Rsi sends the message [/, 
as a return message, and a reactor, which call the function /, reacts to the return 
message by its input-pattern [/, R^t^ *]• 



function f (string_list) -> string_list 

if string_list = "Taskl" then Rs (string_list\index{string_list}) 
else if string_list = "Task2" then Rs2(string_list) 

(a) Program 



\f, Req, *] " 

[[f,Req,Taskl7T| [ [f,ReqJask2,*] 





Rsl 




Rs2 


















f[f,Ret, *], 






([f,Ret,Rsl,*] ] ([f,Ret,Rs2,*] ) 



(b) Pattern dicrected message collaboration 
Fig. 4. Example of collaboration program 



The program constructs are described by patterns in a field. Each entity 
of executions is a reactor on heterogeneous platforms. Therefore, the pattern 
directed message collaboration enables to make collaboration programs among 
agents on heterogeneous platforms using patterns. 

Eurthermore, the programming method of multi- agent collaboration is flex- 
ible, because the method enables to change the program by change of patterns 
dynamically. For example, adding an agent into the field or deleting an agent 
from the field change relations of patterns according to the input-pattern and the 
output-pattern of the agent. It occurs change of collaboration program. There- 
fore, the pattern directed message collaboration provides dynamic collaboration 
among agents. 

Moreover, selection of parallel processing reactors can be controlled by an- 
other reactor on pattern directed message collaboration. To select the candidates 
of reaction, we use the contract net protocol [5]. The contract net is also con- 
structed with pattern directed message collaboration, and works as follows: 1) 
a message appears in a field. 2) Each candidate reactor for parallel processing 
sends a bid message into the field. 3) The arbiter reactor reacts to the bid mes- 
sages and selects one or more reactors from the candidates using its own decision 
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policy. 4) The arbiter reactor sends an award message into the field. 5) The se- 
lected reactors react to the award message and execute their own reactions for 
the original message. 

This arbitration mechanism based on contract-net provides not only the col- 
laboration among agents, but also control of the collaboration by another agent. 

Additionally, we can consider a sequence as one function. The function wraps 
the reactors in the sequence and seems to shrink reactions of reactors. The 
shrunken reactions help us to understand overview of collaboration. 



4 Example of Multi-agent Collaboration on Field Reactor 
Model 

We show an example of pattern directed message collaboration in FRM. This 
example shows how FRM provides a fiexible collaboration. 

We will demonstrate file format translation as an example. The example 
extends its function dynamically. This file format translation system is not only 
conceptual, but also it has actually been developed and is running on a LAN 
environment. 




Fig. 5. System diagram of the example. Arbiter Agent: the reactor arbitrates 
which service reactor translates. There is only one arbiter agent in the field. 
Task Request Agent: the reactors send a task message into the field in response 
to a request from the user interface component. Factory Agent: the reactors 
collect jpeg files translated by service agents, generate an html document and 
start up the web browser with the html document. WinApp2PS Agent: the 
reactors are service reactors that make existing windows applications translate 
application documents into PostScript files using a PostScript print driver. The 
reactors do not appear at version 1. Looper Agent: the reactor generates a request 
message from a task complete message. This reactor also does not appear at 
version 1. Only one looper agent exists in the field. PS2JPEG Agent: the reactors 
are service reactors that translate PostScript file into JPEG file using existing 
component. 
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— Version 1: Displays the contents of PostScript files with the collaboration of 
agents. Collaboration among agents on heterogeneous platforms enables a 
machine, which can not display contents of a PostScript file, to display the 
contents using the web browser on the machine. Also, the example shows that 
load balancing among agents is controlled automatically without making 
special provisions. 

— Version 2: Extends to translate file formats. The system extends its function 
to translate printable file formats only by appending two kinds of agents 
without recompiling the existing agents or restarting the system. 

The diagram of the system is shown in figure 5. The system consists of a 
field and reactors on heterogeneous platforms. The field is unicast using Path- 
walker [8] [9], which provides asynchronous message communication. We use an 
agent communication language [6] of FIPA [7] on a real system and adopt XML 
for exchanging messages among agents. In this example, a pattern and message 
description is a tuple of texts, e.g. [a, b, c, uc], for ease of understanding. 




4.1 Version 1 

Assume that the machine can not display the PostScript file format by itself, 
at first. Agents on heterogeneous platforms enable the machine to display a 
PostScript file as an html document through their collaboration. An overview 
of the system at version 1 is shown in figure 6. The task-request agent on ma- 
chine A sends a message requesting translation into the field. PS2JPEG agents 
on other machines react to the message with their input-pattern. One of them 
is selected and translates a PostScript file into jpeg files, and then sends an 
inform message when the task completed. The factory agent reacts to the task 
completion message, gets the translated files, and starts up the web browser. 

The collaboration among agents is conducted by pattern directed message 
collaboration. The detailed of pattern directed message collaboration is shown 
in figure 7. 

In order to decide which service agent executes, we use the contract net pro- 
tocol. Service agents, PS2JPEG agents, react to the message, which is included 
in the pattern [Request, PS, *]. Free service agents, which are not executing 
tasks, send bid messages into the field. An arbiter agent stores the task message 
into its message pool, and decides which service agent will execute the task. 
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Fig. 7. Pattern directed message collaboration at Version 1. 1) A task request 
agent sends a request message for translation of PostScript format. 2’) Arbiter 
agent reacts to the request message and stores it into its pool. 2) Service agents 
react to the request message. 3) Each service agent sends a bid message with own 
agent ID into the field. 4) Arbiter agent reacts to the bid messages. 5) Arbiter 
agent decides which service agent will execute the task refer to in the pool, 
sends a serve message with service agent’s ID into the field, and delete the task 
in the queue. 6) One service agent reacts to the serve message and the ID of the 
message and translates the PostScript format into jpeg format. 7) The service 
agent sends an inform message as the task complete message. 8) Factory agent 
reacts to the inform message and starts up web browser with the translated file. 



The arbiter agent controls the load among the service agents. The arbiter 
agent only selects a free agent, because only free service agents have bid. Sys- 
tem capacity can be changed at anytime by adding or deleting service agents 
according to system loads. 

Thus, the system enables machine A, which can not display a PostScript 
file format, to display the contents using the web browser on the machine A by 
the collaboration among distributed agents. The agents translate the file format 
using the existing components on heterogeneous platforms. 

We can understand an overview of the system to shrink reactions. The system 
behavior is clear by shrink reaction. The system converts messages from the 
pattern [Request, PS, *] to the pattern [Inform, JPEG, *]. The figure 6 shows 
shrunken reactions in figure 7. 



4.2 Version 2 

The system dynamically extends its function to translate printable documents 
into html documents by adding two kinds of agents without recompiling or 
restarting the system. One is a WinApp2PS agent that automatically prints 
an application document to a PostScript file using a PostScript print driver. 
Another is a Looper agent that converts a task completion message into a task 
request message. The Looper agent creates a translation loop while target files 
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are generated, until when no target files are specified in the input-pattern of 
message to the Looper agent, then the loop terminates. 

A user requests the system to translate a user file into an html file. User 
files are translated into PostScript files by the WinApp2PS agent, and then the 
PostScript files are translated into JPEG files by the PS2JPEG agent. 




Fig. 8. Pattern directed message collaboration at version 2. 1) Task request 
agent sends a request message for the translation of an application file, e.g. 
powerpoint. 2) WinApp2PS agents react to the request message and bid. 3) One 
of WinApp2PS agents is selected to execute the task by the arbiter through 
the serve message, translates the application file to PostScript file, and sends 
an inform message. 4) The Looper agent reacts to the inform message except 
translation message to jpeg file and sends a request message for translation of 
PostScript format. 5) PS2JPEG agents react to the request message and bid. 6) 
One of PS2JPEG agents is decided to execute the task by the arbiter and sends 
an inform message. 7) The factory agent reacts to the inform message and starts 
up the web browser with the translated files. 



Eigure 8 shows the message pattern of the system extended after adding new 
agents. The input-patterns and the output-patterns of the WinApp2PS agents 
include each pattern of PS2JPEG agents. The Looper agent converts the input- 
pattern [Inform, *] to the output pattern [Request, *], however, the Looper agent 
does not convert the messages included in pattern [Inform, JPEG, *]. 

To shrink reactions, we can get an overview of sequences of version 2 in figure 
9. We can recognize creation of a loop of the system. Looping of message patterns 
is controlled by the Looper agent. Once the task request message appears in the 
field, the agents repeat transforming the messages. When the message, which is 
included in pattern [Inform, JPEG, *], appears in the field, the loop terminates. 
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The system extends its function to translate printable documents into html 
documents. The system repeats each step of the translation task until jpeg files 
are finally generated. In this example, the user file is translated into a PostScript 
file by WinApp2PS agents, and then translated into a JPEG file by PS2JPEG 
agents. If the user file is a PostScript file, the file is only translated into a JPEG 
file. The user on Machine A does not conscious that the machine has applications 
to display files. 

This example shows that user need not worry about resources and applica- 
tions of computers. However, the example does not show all of the ubiquitous 
computing. We consider that to realize ubiquitous computing, it is important 
to not make users consciously use applications. Providing seamless application 
environment is one of the environments of ubiquitous computing. 



5 Related Work 



ERM supports agent’s communication using a field, and provides pattern-ori- 
ented collaboration. MARS (Mobile Agent Reactive Spaces) [10] is a mobile- 
agent platform using Linda [11] which supports indirect communication among 
agents. Adaptive Agent Oriented Software Architecture (AAOSA) [12] is a new 
system architecture based on a multi-agent system. Agent-based distributed in- 
formation processing system (ADIPS) [13] is a flexible distributed system frame- 
work. ERM uses pattern- matching to collaborate among agents. Many researches 
about pattern- matching are done in Prolog [14]. 

MARS also has reactive mechanism and Linda is a kind of medium. Agents 
receive filtered messages through patterns of Linda written by other agent. It is 
possible to apply ERM to MARS. In that case, reactive agents of MARS works 
as reactor of ERM and Linda works as a field. Hence, agents in MARS are able 
to collaborate each other using pattern directed message collaboration. 
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AAOSA is a multi-agent architecture. Agents directly communicate with each 
other through one-to-many communication. Each agent understands messages 
based on its own interpretation policy. One difference between FRM and AAOSA 
is communication style. An agent on AAOSA sends a message to other agents 
directly using its own Address-Book. The Address-Book keeps an address list of 
other agents for sending messages. The Address-Book has the same function as 
the field, and the concept of the FRM field will give a base to device a many-to- 
many message communication in AAOSA. 

ADIPS is an agent-base framework to support to construct flexible dis- 
tributed systems. One difference between FRM and ADIPS is a construction 
method of agent collaboration. An agent arranges collaboration among agents 
on ADIPS. The pattern directed message collaboration in FRM will provide 
a device of method to construct collaboration among agents without arrange 
agents for ADIPS. 

Prolog has a powerful pattern matching mechanism, called unification that 
provides type-matching. However, collaboration among agents does not need 
powerful pattern matching such as unification. We consider that it is enough to 
create systems using simple pattern matching such as text matching. Our aim 
is to create a model to support flexible collaboration among agents on hetero- 
geneous platform. Hence, we can adopt works of pattern matching in Prolog for 
more powerful collaboration in FRM. 



6 Conclusion 



We presented a flexible multi- agent model, called the Field Reactor Model (FRM) 
for open system environment such as ubiquitous computing. A field is an abstract 
medium, which is independent of the physical media, and dispatches messages 
to all agents. Several fields on various networks can be combined into one logical 
field that enables agents to collaborate across various media and heterogeneous 
platforms. Reactors are agents in the field, which have transformation rules for 
messages using pattern matching. The collaboration method among reactors is a 
pattern directed message collaboration that yields functional relations between 
patterns of reactors. The pattern directed message collaboration does not only 
provide a flexible collaboration program method using patterns among agents, 
but can also control the collaboration among agents by other agent. The FRM 
provides flexible and dynamic collaboration among agents. We described how to 
apply the computation scheme originated from dataflow to the pattern directed 
message collaboration. Also, we showed the flexibility of the model with an ex- 
ample. The example demonstrates file format translation by the collaboration of 
reactors on heterogeneous platforms and change of system behavior dynamically. 

We consider that FRM does not suit strict systems such as banking backbone 
systems. The FRM suits open system environment such as the Internet and 
provides flexible collaboration among agents. 
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Our ongoing work and further research includes a formal definition of the 
FRM, development of more efficient federation, designing pattern directed mes- 
sage collaboration, and the evaluation of the FRM. 

We research on mediator [15] as an efficient federation of gateway reactors. 
The mediator, which we are now proposing to FIPA, will intercede with servers 
on behalf of clients. 
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Abstract. It is well recognized that Agent Communication Languages 
(ACL’s) are a critical element of Multi- Agent Systems and a key to their 
successful application in commerce and industry. The field of protocol 
engineering^ which addresses the problems of specifying and verifying 
machine communication languages and testing implementations, has de- 
veloped powerful theoretical and automated techniques for doing this, 
and more importantly, a mature understanding of the requirements that 
communication language and protocol specifications should meet. Unfor- 
tunately, those developing and promulgating ACL’s appear not to have 
taken advantage of this body of knowledge. An examination of the cur- 
rent ACL specifications being developed by the Foundation for Intelligent 
Physical Agents (FI PA) reveals a confusing amalgam of different formal 
and informal specification techniques whose net result is ambiguous, in- 
consistent and certainly under-specified. Allowances must be made, as 
these are draft specifications, but rather than providing a verified founda- 
tion for reliable communication between heterogeneous agents, they seem 
likely to lead to a host of unreliable and incompatible implementations, 
or to be ignored in favour of more pragmatic and robust approaches. 
In this paper, we propose a set of requirements against which an ACL 
specification can be judged, briefly explore some of the shortcomings of 
the FI PA ACL and their origins, and contrast it with a small ACL which 
was designed with reliability and ease of verification as prime objectives. 



1 Introduction 

It is well recognized that Agent communication languages (ACL’s) are a criti- 
cal component of a multi-agent system (MAS), doubly so for systems that are 
intended to be open, allowing agents with different origins and architectures to 
communicate and cooperate. Indeed, the ability to communicate in an ACL is 
often regarded as the key feature that distinguishes software agents from other 
software components [7]. Understanding the importance of ACL’s to the adop- 
tion of MAS technology by industry, and perhaps fearing the emergence of a 
Babel of incompatible languages, agent researchers in academia and industry 
have jointly invested considerable effort into ACL design. 

Examples of this are the widely used KQML [2], which emerged in 1993 as 
a result of collaborative research in the DARPA Knowledge Sharing Effort and 
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is perhaps now a de-facto standard (though many variants exist), and the more 
recent efforts of the Foundation for Intelligent Physical Agents (FI PA) to establish 
and promulgate (amongst various other standards) a standard ACL [5], the latest 
revision of which [6] is “out for comment” and due for official release in October 
of this year. 

Unlike more traditional machine communication languages (MCL’s), which 
focus on data transfer and associated or unrelated control operations, the focus 
of ACL’s is upon the communication of knowledge and communication about 
acting, where acting includes communicative actions themselves, often called 
performatives. Here we will adopt the more neutral “message” and assume that 
sending a single message is the only possible communicative action, so that the 
message itself (assuming it specifies its intended recipients) can be said to fully 
characterize the action, though not the context in which it occurs. The content of 
an ACL message is thus usually required to be either a proposition or a descrip- 
tion of an action. Perhaps for this reason, ACL’s are invariably characterized as 
being “high-level” languages. In the words of the original KQML specification [2]: 

KQML is intended to be a high-level language to be used by knowledge- 
based systems to share knowledge at run time. ... KQML is comple- 
mentary to new approaches to distributed computing, which focus on 
the transport level. For example, the new and popular Object Request 
Broker [OMG ORB] specification defines distributed services for inter- 
process and inter-platform messaging, data type translation, and name 
registration. It does not specify a rich set of message types and their 
meanings, as does KQML. 

The FI PA ACL specification [6] says of itself: 

[The] specification defines a language and supporting tools, such as pro- 
tocols, to be used by intelligent software agents to communicate with 
each other. The technology of software agents imposes a high-level view 
of such agents . . . the mechanisms used support such a higher-level, often 
task based^ view of interaction and communication. 

and defines an ACL, and says of the content of a message: 

A language with precisely designed syntax, semantics and pragmatics 
that is the basis for communication between independently designed and 
developed software agents ... a content language must be able to express 
propositions, objects, and actions. 

This reveals another sense in which ACL’s are high-level: they are abstract 
languages which allow that the message content may be in any of a variety 
of (separately specified) content languages. The intention here is not only to 
encourage diversity, allowing the independent development of application and 
domain specific content languages, but also to permit some message processing 
to be done without a need to understand the content. Moreover, because sending 
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a message is also an action, another ACL message may itself be the content; the 
language is thus recursively defined. 

As a result of this focus on knowledge and action, ACL messages are usu- 
ally given meaning in terms of or by reference to speech act theories [1,16] and 
abstract models of agent “mental states” [15]. As well as their explicit content, 
messages are interpreted as implicitly communicating the beliefs, desires or in- 
tentions to know or act of the sender. Thus for example, a tell message whose 
content is “It is raining” is taken to communicate not just that proposition, but 
also that the sender believes it, and perhaps also wants the recipient to believe 
it and believes she does not believe it and wants to know it. There are a num- 
ber of recognized problems [17,20,12] that arise from this approach to giving a 
semantic characterization of messages, and others that have been less discussed, 
some of which we will address in this paper. 

Despite the difference in focus, ACL’s share with lower-level MCL’s the pur- 
pose of enabling efficient, reliable interoperation of disparate software compo- 
nents in a distributed environment. The subfield of computer science and soft- 
ware engineering which has traditionally addressed this problem is known as 
protocol engineering. Substantial progress has been made in this field in un- 
derstanding how communication languages and protocols may be adequately 
specified and verified to be “correct” , and how particular implementations of a 
language may be tested for conformance to a specification and for interoperabil- 
ity. Note particularly that verification here refers to the absence of errors and 
the suitability for its purpose of the specification itself. Verifying that an im- 
plementation behaves correctly is called conformance testing which, along with 
interoperability testing, falls within the ambit of certification. 

Reliability is perhaps the key issue, and modern approaches to protocol en- 
gineering rely heavily on Formal Description Techniques (FDT’s) for specifica- 
tion, and techniques for automated verification and machine supported testing 
to guarantee it. In this paper we will argue that currently available ACL’s, in 
particular the draft FI PA standard, are likely to be disappointing in this respect, 
for several reasons: the specification techniques adopted are insufficiently formal 
and coherent, leading to problems in verifying properties of the specification, the 
meaning given to messages and their parameters is inadequately defined, and the 
extent and quality of its protocols is deficient. Perhaps most critically, the key 
notion of an interaction is left undefined. Moreover, as the FI PA ACL specifica- 
tion acknowledges, the problem of verifying conformance has been completely 
side-stepped. Given the way in which the ACL semantics is given, it appears for 
many agents to be impossible [20] or effectively meaningless [17]. 

The focus of this paper is not upon the conformance testing problem, but 
upon the specification and verification problems, and the quality and verifiabil- 
ity of the FI PA ACL specification in particular. We aim not to carp from the 
sidelines, as the objectives and efforts of those involved in the development of 
standard ACL’s are laudable, though it is reasonable to question whether the 
effort to promulgate standards is premature and whether theoreticians should 
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be the key players in that effort^. Rather, we aim to contribute a different per- 
spective to that effort by proposing a set of basic requirements against which 
an ACL specification can be judged, briefiy exploring some of the shortcomings 
of the FI PA offering and their possible origins, and contrasting it with a small, 
pragmatic ACL and its associated protocols which were designed with reliabil- 
ity and ease of verification as their prime objectives. We begin by considering 
generic requirements for specification of ACL’s. 



2 Requirements for ACL Specification 

The basic requirements for any specification are clarity, consistency, and com- 
pleteness. If these are met then ambiguity which results from inconsistency, or 
gaps in the specification which are open to interpretation, is eliminated. It should 
then be possible to determine whether the object of the specification meets its 
requirements, provided of course that those requirements have themselves been 
adequately specified. 

In the case of an ACL, the functional requirements are that it should enable 
agents to communicate about domain knowledge and action; the operational 
requirements are that it should be implement able, efficient, reliable, and robust 
in the face of possible errors, including errors in implementation. From this fiow 
met a- requirements: that the consistency and completeness of the specification 
be verifiable (clarity can only be assessed), ideally by automated techniques, and 
that the conformance of an implementation to the specification (or some part of 
it) be verifiable or at least testable. 

Good protocol engineering practice dictates that the first meta-requirement 
can be achieved by the adoption of Formal Description Techniques (FDT’s) 
for specification, and the subsequent application of automated verification tech- 
niques to the specification. FDT’s are closely related to the area now known 
as Formal Methods, and different FDT’s are appropriate for different elements 
of an ACL specification. Examples are the use of Abstract Datatype languages 
such as ACT ONE [4] and context free grammars for the specifications of message 
syntax, and state machines, CCS-like languages such as LOTOS [8] and temporal 
logics for the specification of protocols. The use of FDT’s has several advantages 
over informal techniques. 

— Formal languages encourage completeness during specification development. 

— The meaning of a consistent, complete specification is unambiguous. 

— The specification is machine readable and so amenable to automated verifi- 
cation of its consistency and completeness, and also of functional properties 
such as safety and liveness of interactions, e.g., absence of deadlock. 

— Specifications can be processed by tools such as parser and translator gen- 
erators. 

^ These issues were much discussed at a panel-session on the subject of agent standards 
at the Agents’ World conference in Paris last year. 
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An ACL specification usually consists of a number of different elements, and 
in the sections that follow we will consider specific requirements for each. But 
first, some general principles applying to any standard are worth mentioning. 

Requirement 1 Standard ACL specifications should: 

— precisely delimit there own purpose and scope, 

— not employ narrative as a primary specification technique, 

— he formal, concise, well organized, and amenable to verification, 

— explicitly and unambiguously define requirements, options, and prohibitions, 

— declare exactly what constitutes compliance to the specification, and 

— indicate precisely which (if any) elements are subject to future change. 

The FI PA ACL specification (henceforth, FI PA) scores poorly on all but the 
first item. The specification relies on a mix of informal narrative, requirements 
statements, tables, logical formulae, and diagrams. None of these elements are 
adequately formal, and the result is often inconsistent at a fundamental level. 
For example. Requirement 1 dictates that “Agents should send not-understood 
[messages] if they receive a message that they do not recognize . . . ” , while Re- 
quirement 2 states that “An ACL compliant agent may choose to implement 
any subset of the predefined messages types . . . ” , including subsets which do 
not contain not-understood\ 



2.1 Message Syntax 

Unlike human communication, messages in an artificial language for machine 
communication are typically explicitly typed data structures that carry named 
parameters whose value domains and interpretation must be specified. We as- 
sume here that such a message structuring technique is adopted. For clarity 
and to avoid unduly restricting implementations, messages should be specified 
in a manner that abstracts from details of how they might be encoded and 
transported, i.e. in an abstract syntax in terms of a set of primitive, abstract 
datatypes. 

The purpose of typing is to impose a partition or classification on messages to 
simplify their specification and to facilitate message processing: the partition can 
be coarse or fine-grained. Usually each type has a quite distinct communicative 
function. Overly fine partitions tend to result in a large specification and many 
or complex protocols, so parsimony is a virtue. If the datatypes of parameters do 
not depend on the message type, the message syntax is uniform and they may 
be defined globally rather than separately for each message type. Further, if the 
permissibility of or requirement for a parameter is independent of the message 
type, the message syntax is highly uniform. Uniformity of the message syntax is 
desirable, but a highly uniform syntax tends to admit messages whose meaning 
may be unclear. For example, it may not be clear whether a message that contains 
an in-reply-to parameter sent to a recipient who has never sent a message to the 
sender is an error or merely redundant. This is of course a semantic issue which 
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is not the responsibility of the message syntax specification to address, but in 
some sense a problem that its design may have caused. 

In ACL’s, it is usual to distinguish and specify separately two parts of message 
syntax: usually called structure and content. The intention is to be quite abstract, 
permissive and flexible about the form of one parameter value, the content or 
“inner message”, allowing it be expressed in any one of a variety of content 
languages, while being quite concrete and restrictive about the language of the 
remainder of the message, so that its type can be recognized and non-content 
parameters interpreted without any need to understand the content. In addition, 
to allow communication in a specific content language about different domains of 
discourse, it is usual to allow a further separate specification: an ontology which 
describes the form and meaning of the vocabulary pertinent to a particular 
domain. The content language and ontology are usually indicated explicitly by 
standard parameters, as in the example below. 

Finally, a message syntax specification may in effect constitute multiple spec- 
ifications of explicit variants or subsets, and may also address implementation 
issues such as word size, string length, etc. Care must be taken in how this is 
done to avoid compromising the integrity and clarity of the specification and its 
ability to be processed automatically. 

Requirement 2 Message syntax should be specified in a formal language in 
terms of primitive abstract datatypes so as to: 

— enumerate the message types, 

— enumerate the standard parameter names, and say whether and what non- 
standard parameters are permitted, 

— define the syntax of each message type, namely the required and optional 
standard parameters, the datatypes of their values, and any other structural 
restrictions, such as the order in which parameters occur, and 

— allow it to be determined whether any given message is syntactically valid. 

FI PA employs a concrete lisp S-expression message syntax, which is defined in- 
formally by example, narrative, a table of standard parameters and a catalogue 
of message types. A prototypical message is: 

(inform 

: sender agent 1 

: receiver hpl-auction-server 
: content (price (bid good02) 150) 

:in-reply-to round-4 
:reply-with bid04 
: language : si 
: ontology hpl-auction 

) 

The syntax is highly uniform: it is explicitly stated that no standard parame- 
ters are prohibited and only one is mandatory, yet throughout the narrative it is 
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often implied that others are required. The catalogue of message types describes 
type-specific restrictions only on the content parameter, not on other standard 
parameters. By contrast, the latest KQML proposal [10] is far more systematic 
and explicit in defining requirements and restrictions. Both however suffer from 
a surfeit of specialized message types. 



2.2 Message Transport 

An ACL specification must define some (perhaps minimal) set of characteristics 
and assumptions about the behaviour of the underlying message transport layer 
which is responsible for accepting messages for delivery, determining whether 
it is possible to initiate delivery, and then subsequently doing so, successfully 
or otherwise. This has two main aspects: a specification of requirements for the 
behaviour of the message-passing interface at the agent / platform boundary; and 
a specification of how the transport layer may behave with regard to message 
delivery, what guarantees it must make, how errors may be handled, etc. In addi- 
tion one must specify exactly how messages in the abstract syntax are encoded 
and decoded into bit-streams, but this does not have to be part of the ACL 
specification: ACL’s usually permit this to be done in any way that preserves 
the messages integrity, often by reference to other standards. A particular as- 
pect of message transport that should however be made precise is which set of 
message parameters are interpreted by the transport layer, which may actually 
be responsible for adding some parameters, such as the sender^ to the message. 

Requirement 3 A transport layer speeifieation should define: 

— the subset of syntaetieally valid messages that will be aeeepted for delivery, 
and how the interfaee will behave on those outside this set, 

— what messages will possibly be delivered, how messages may be modified, and 
how message delivery failure will be handled, and 

— other issues, sueh as whether the interfaee is bloeking or non-bloeking , the 
eireumstanees under whieh it may or will bloek, guarantees of sequeneing, 
ete. 

FI PA enumerates the requirements that a transport layer must meet, but appears 
to be silent on the issue of which parameters it may add or modify. For example, 
it is unclear whether an agent can send a message anonymously or forge one 
that appears to the recipient to come from a third party by omitting or faking 
the sender field. 



2.3 Interactions 

Specifications of message syntax and transport interface, taken together, define 
the set of syntactically valid messages that will be accepted for delivery, the 
circumstances under which they will be successfully delivered, and the expected 
behaviour when this is not possible. It is conceivable, in the case where there 
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are no context- or history-dependent restrictions on what messages may legally 
be sent, when, and to whom, that nothing more need be said about the meaning 
of a message beyond that already covered by the content language and ontology 
specifications. Abstractly, whatever the actual typing scheme, every message 
can be viewed as effectively being of type say^ sending a message constitutes a 
complete interaction (assuming it is successfully delivered), and the meaning of 
agent A sending message X to agent B is just “A said X to B” , for A, B and any 
suitably privileged observer^. 

This situation would however be highly unusual; MCL’s of every sort impose 
some restrictions on which messages may be sent and when, and how the receiver 
of a message may or must behave. The fundamental reason for this is that 
an individual message rarely constitutes a complete interaction: some messages 
serve to initiate a new interaction, some to continue an existing one, and others 
to terminate it. Unless it permits only trivial, single- message interactions, an 
ACL specification must describe precisely the relationships between interactions 
and individual messages. 

Requirement 4 An ACL specification should define the set of valid interactions 
which may he assembled from individual messages by formally specifying: 

— what constitutes an interaction and how different interactions are distin- 
guished, 

— how messages are to he associated with interactions, new or existing, 

— the interaction contexts in which messages may or must be sent, and 

— how their sending and receipt affects the future course of the interaction. 

Conventionally, this requirement is met by protocol specifications. These make 
explicit some common understanding of how communication is to occur by con- 
straining which message types may or must be sent in what contexts, by placing 
additional restrictions on the values of message parameters in particular con- 
texts, and perhaps also by imposing temporal constraints. In traditional MCL’s 
the use of a protocol is obligatory and the complete set of protocols, including 
variants and subsets, are specified explicitly. There is an explicit notion of inter- 
action state, and it is clear which messages initiate a new interaction, and which 
continue or terminate an existing interaction. This approach can be viewed as 
giving meaning to messages only in terms of their effect upon future commu- 
nication; other effects upon the behaviour of the sender or recipient are the 
subject of separate specification. Protocols constitute a normative specification 
of interaction which does not assume any particular model of the participants. 

Both KQML and FI PA adopt a quite different approach: they attempt (in 
slightly different ways) to give a “precise formal semantics” to messages in terms 
of pre- and post-conditions upon the mental states of the sender and recipient. 

^ The situation is slightly more complex if multiple recipients are permitted, as the 
meaning may differ for the sender and recipients if the receiver parameter is not 
delivered or is modified by the transport layer. For example, the sender may know 
that a message was sent to multiple recipients whereas they may not. 
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Characterizations of mental states are based on some mixture of what an agent 
knows, believes, considers possible, wants, and intends. Restrictions on when and 
to whom messages may be sent are captured by preconditions, which depend on 
the message content^ on the sender’s mental state; an expectation of the possible 
effect upon the recipient is also given but cannot be relied upon. What is largely 
absent is an explicit notion of the state of the interaction, as opposed to those 
of the participants, so the set of valid interactions must somehow emerge from 
the preconditions. Agents can, so the theory goes, generate valid interactions by 
planning, and by responding cooperatively to received messages. 

Various problems with this approach have been pointed out [17,20,14], in- 
cluding: 

— the difficulty of finding an appropriate, uniform model of agents’ mental 
states, 

— the need for restrictions to be adequately complete and consistent without 
ruling out useful interactions or imposing a particular model of agent ratio- 
nality, and 

— the difficulty of determining when an agent is conforming to the specification, 
given the unobservability of its mental states. 

Another difficulty is the complexity of interaction planning, i.e., reasoning 
about the preconditions and effects of messages on mental states. The FI PA speci- 
fication recognizes this and allows but does not mandate that protocols are used. 
What it fails to do however is to adequately define what constitutes a distinct 
interaction, the constraints that apply to the use of individual messages types 
and on the values of their non-content parameters, how protocols are to be used, 
and a sufficient set of standard protocols themselves. As a specification, it falls 
far short of meeting the above requirement. Perhaps it is simply not its intent to 
specify these things, but merely to provide a framework in which that may done. 
If this is the case its promulgation as a standard to promote interoperation is 
decidedly premature. If not, it is lamentably incomplete. 

In our view the fundamental problem with the FI PA approach is the absence 
of any well-defined notions of an interaction, its state, and its valid continua- 
tions. Without such notions an ACL specification cannot effectively prescribe or 
proscribe how interactions may occur or impose structure upon them; it simply 
has no normative force. As Singh has noted [17], every “conversation” between a 
pair of agents is arguably conformant, as all that conformance requires is to show 
that the mental states of both agents met the preconditions of each utterance 
they made, and FI PA does not in any way restrict how agents’ mental states are 
implemented or change over time. Mental states are thus associated with par- 
ticular communicative acts and events, but as the “mental loop” is not closed, 
communicative acts are not associated with each other or with interactions in 
any publicly verifiable way. 

Without a well-defined notion of an interaction, it may not even be possible 
to distinguish independent interactions. For example, suppose A sends message 
X to B and simultaneously B sends message Y to A, and they have never pre- 
viously communicated. From an external perspective, these are clearly separate 
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interactions as there is no causal relationship between them, but unless the ACL 
precisely defines, in terms of message types and the use of non-content param- 
eters, how interactions are initiated, continued and distinguished, it may be 
impossible for either agent to tell that the other’s message is not a (probably 
inappropriate) response to the one they sent. FI PA does not and can not effec- 
tively address such issues, relying as it does on a fuzzy, informal notion of a 
conversation that lacks clear identity or extent. 

A fundamental limitation of the mental-state precondition approach is that 
it is insufficiently expressive: it simply cannot capture the notion of an obligation 
to respond. Furthermore, as used by FI PA, preconditions apply only to message 
content and do not impose any constraints on non-content parameters. Some 
such obligations and requirements are in fact specified separately, often in nar- 
rative. For example, it is stated that “Agents should send not-understood if they 
receive a message they do not recognize or they are unable to process the content 
of the message. Agents must be prepared to prepared to receive and properly 
handle^ a not-understood message.” This is in fact an informally specified frag- 
ment of a protocol whose use is obligatory. Yet elsewhere it is stated clearly that 
the use of protocols is optional. This reveals a basic confusion at the heart of the 
specification between obligatory protocols associated with the use of particular 
performatives, and optional higher- level protocols which may be layered upon 
these. In FIFA these two levels are conflated. 

We take the view that an ACL specification, particularly one intended as a 
standard, must effectively impose some concrete, normative model of interaction 
which permits agents or their designers to unambiguously determine whether a 
given interaction is valid; if so, to determine its set of possible valid continuations; 
and if not, to determine which agent has failed to comply with the specification. 
Such a determination must be based on the interaction itself, not on privileged 
information about the agents’ mental states. To do this requires a complete 
set of explicit protocols which effectively capture the necessary restrictions and 
obligations on message occurrence and parameters. 

If an interaction is not valid, because one or more of the participants (or per- 
haps the underlying platform) has violated the specification, then it may or may 
not be specified how the interaction may be continued; if not, the participants 
are in uncharted territory. Well-designed protocols, however, usually specify how 
at least some protocol violations should be handled to ensure that interactions 
can be cleanly terminated and resources released. They are thus more robust 
with regard to misimplementation and to certain types of system failures. We 
shall return to the issue of how protocols may be specified in Section 3. 



^ Unfortunately it is nowhere stated what exactly constitutes proper handling. 
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2.4 Multiple Interactions 

Having specified what constitutes a valid interaction, an ACL specification should 
explicitly define two more aspects of agent interaction. The first concerns the 
relationship between an interaction and any previous interactions that may have 
occurred, usually those between a particular pair of agents, but sometimes glob- 
ally. This usually takes the form of additional restrictions on message parameter 
values. For example, it may be required that the value of a particular param- 
eter, such as reply-with^ never be reused across distinct interactions within the 
lifetime of an agent or system, that particular message types such as request 
always contain distinct values of parameters such as conversation-id^ or that 
parameter values used by different agents must be distinct. Secondly, the speci- 
fication must say something about the possibility of multiple parallel interactions 
between agents, i.e. whether and when they are permitted, and if so, the mech- 
anisms that allow agents to distinguish different concurrent interactions. These 
aspects can be viewed as a specification of how individual interactions may be 
combined sequentially or interleaved. 

Requirement 5 An ACL specification should define: 

— any restrictions on sequential or concurrent interactions, and 

— any other global requirements upon message parameter values. 

FI PA does not to address this requirement, even informally. 



2.5 Models of Mental States 

Even if one accepts that interactions should emerge not from protocols but from 
the assembly of messages according to preconditions on agent mental states, 
there are problems with the way this is done in FI PA. The model of mental 
states adopted is based upon five modal operators: Belief (B), Uncertainty (U), 
choice (C), persistent goal (PG) and intention (I). Amongst the criticisms that 
might be made of this model, the following are most relevant to the question of 
the overall quality and reliability of the ACL. 

1. The model of belief and uncertainty adopted is peculiar. Rather than the 
conventional axioms which admit Bicj), and Xif = -^Bicf) as 

the 3 mutually exclusive states of an agent’s beliefs, the additional states Uicj) 
and Ui^(f) are added. These are defined as an agent believing f is more (resp. 
less) likely than and the state of ignorance becomes Xif = ^Bif A 
-^Uif A -^Ui^cj) A -^Bi^cj). The logical axioms underlying this discrete notion 
of uncertainty and its closure properties are not spelled out. The use of this 
model has the following consequences. 

— The act of informing another of one’s belief state is split into the mutually 
exclusive message types inform, confirm, and disconfirm, all of which 
have the same effect and differ only in the context in which they may 
be used. For example, if agent Ai believes 0, but also believes (of A 2 ) 
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that ^2^0 or ^72^0, it must perform dis confirm {A2^^(!)) rather than 
inform{A2^ f). The only escape from this is for an agent to have no 
beliefs about the mental states of others, in which case it may use inform 
in any circumstance. 

— It is impossible to directly communicate states of uncertainty, as there 
are no messages whose direct effect is to make another agent uncertain 
or ignorant about something. Instead, one may only attempt to com- 
municate indirectly and rely upon the recipient’s performing the right 
inferences. Suppose Ai performs inform{A2^ (j)) with effect ^20, but then 
subsequently becomes uncertain about It can then only perform in- 
form{A2, (or dis confirm {A2 ^ B if)) whose effect is B2^Bi(j) rather 

than ^B2(p^ or inform{A2,Uif) whose effect is B2Ui(t). Thus simple 
agents unable to perform such reasoning are precluded from commu- 
nicating states of ignorance. 

2. It is however required that an agent have beliefs about the mental states of 
others to use certain performatives. For example, part of the precondition 
for an agent’s sending a cancel is Bi{B2hDone{a) V U2hDone{a)). Thus 
a simple agent that does not model the mental states of others cannot use 
cancel and remain conformant; it presumably must use the more basic in- 
form{A2^^Ii Done{a)) to cancel a previously sent request {A2^ a). It is not 
even clear that the basic action request can be used: the specification is 
inconsistent with regard to its preconditions. 

3. The preconditions chosen are unnecessarily restrictive. To use other perfor- 
matives it is required that an agent not have certain beliefs. For example, an 
agent may not question another about a proposition if it already believes or 
disbelieves it. Their are obvious circumstances in which this is nonetheless 
a useful thing to do. 



3 Protocol Specification 

FI PA relegates protocols to a secondary role, regarding them as optional pat- 
terns; a means for simple agents incapable of adequate reasoning to engage in 
meaningful interactions, rather than as a mechanism by which interactions are 
defined and meaning given to messages. Only a handful of protocols are defined, 
including trivial ones for requesting and querying, the ubiquitous contract net, 
and two varieties of auctions. At just one point, however, FI PA does briefly recog- 
nize the role that protocols should play. In describing the request-when protocol, 
it comments that it is ‘^simply an expression of the full intended meaning of the 
request-when action^\ 

Protocols are used by sending messages containing a protocol parameter. 
Their use is not only optional, but potentially partial; an agent may abandon a 
protocol half-way through merely by sending a message not carrying the protocol 
parameter. The exact constraints upon use of the conversation-id^ reply-with 
and in-reply-to parameters are never clear, nor is their connection to the use of 
protocols. In various places vague notions of sub-protocol or sub-conversation 
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Fig. 1. Service Requestor and Provider Protocols 



are introduced, but FI PA never distinguishes between hierarchical composition 
of protocols and independent concurrent interactions. 

FI PA protocols are specified by flow-chart diagrams, and some also by a new 
diagrammatic technique derived from the UML object-oriented specification for- 
malism. As specifications, they are informal and lack detail. The new technique 
is less perspicuous than the flow-chart one, and introduces a range of notations 
which focus on peripheral issues, such as cardinality and repetition of individ- 
ual messages, and ignores central ones such as the constraints applying to non- 
content parameters and how their values affect protocol state. It is not even clear 
that the technique is sufficiently expressive to capture iterative protocols. By 
contrast, the techniques adopted in specifying KQML interaction protocols [11] 
effectively deal with these central issues. 

All the protocols defined by FI PA are also of the simplest type: alternating 
or half- duplex protocols, where at any given point only one agent may send a 
message. Thus the particularly important issue of how message collisions, which 
may occur when communication is asynchronous and full-duplex, should be dealt 
with is never addressed. For example, the request protocol does not even include 
the case of the requestor cancelling the request, let alone address the possibility 
of a message collision. Reliable protocols must recognize and deal effectively with 
such possibilities. A well-designed protocol will also specify how a participant 
in an interaction should behave in the event that the protocol is violated by 
another, in order to ensure clean termination. 

Figure 1 is an example of an ACL protocol for service provision, taken 
from [3], that addresses such issues. It is specified by state machines, with transi- 
tions between states labelled by sent (!) or received (?) messages. The behaviours 
of the requestor and the provider are specified separately. Message parameters 
are omitted for clarity, however distinct subcases are identified in parenthe- 
ses. Note, for example, that it effectively captures such subtleties as that the 
requestor may cancel a request after suspending it, but not vice-versa. This 
protocol and the syntax of its underlying messages has been more formally spec- 
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Fig. 2. Message Interaction Scenarios 



ified [3] in the Z language [18], and its properties verified by the model-checking 
tool SMV [19]. 

Figure 2 shows several service provisions interactions. The third illustrates 
the case where a collison occurs between a cancellation and completion message. 
Not only is the protocol robust in this case, but the requestor can actually 
determine that completion occurred prior to cancellation, i.e., distinguish the 
second and third cases, by examining the (non-content) status parameter of the 
completion message. The protocol and its design objectives, which are discussed 
in detail elsewhere [9], are to provide: 

— support for multiple concurrent service provisons, 

— functionality such as the provision of interim results, suspend/resume and 
cancel, 

— robustness and clean termination on error, 

— fiexibility and efficiency in communication, and 

— verifiability of protocol properties. 

We are not suggesting that this ACL represents an alternative to FI PA. Rather 
it illustrates a dramatically different approach to how ACL’s and their proto- 
cols should be designed and specified, one which takes seriously issues such as 
robustness and verifiability. Our experiences suggest that adequately addressing 
such issues at the ACL design stage is crucial to the succesful application of 
MAS technology on a non-trivial scale, even for closed agent systems. 

4 Conclusion 

Evaluation of agent communication languages is an issue that has begun to be 
addressed seriously by agent researchers [13,19]. A survey and comparison be- 
tween KQML and FI PA which also points out some important pragmatic issues 
has also just appeared [12]. This paper has also attempted to view the specifica- 
tion of ACL’s from a pragmatic perspective, focusing on the need for reliability, 
which we argue can only be achieved by adopting suitably formal specifica- 
tion techniques. We have proposed a number of requirements against which an 
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ACL might be evaluated, and pointed out how far the FI PA ACL specification 
falls short in this regard: it is informal, underspecified, ambiguous and vague 
about central issues. Little significant progress seems to have been made since 
its first release. By contrast, the latest proposed KQML specification [10], while 
still rather informal, is far more systematic and complete, and related work on 
procedurally specifying KQML interaction protocols [11] is adopting more formal 
techniques. 

There is also a growing realization [17,20,14] that the mental-state precondi- 
tion approach adopted by FI PA and KQML is incapable of providing an effective, 
normative basis for agent interaction and interoperation, and that explicit, oblig- 
atory protocols are required. We concur with this view. FI PA has focussed on an 
interesting but peripheral issue, the relationship between communicative acts 
and mental states, and failed to address the central issue of the direct relation- 
ships between communicative acts or to define precisely an appropriate notion 
of interaction. From this one might reasonably conclude that the FI PA ACL is 
a rich field for further research and careful engineering, but is, as yet, quite in- 
adequate as a standard. But suitably clarified, formalized and augmented by a 
comprehensive set of interaction protocols, it has the potential to fill that role. 
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Abstract. Acknowledgements, agreements, and disgreements are basic 
moves in communications among agents, since the moves form and re- 
vise shared information among the agents which is basic prerequisite 
of group- actions. This paper investigates formal semantics of the moves 
from the point of view of information sharing among agents, exploit- 
ing the circular objects assured by Hyperset Theory. Therefore, avoiding 
definitions of shared information by infinite conjunctions of propositions 
with nested epistemic modalities, the actions are all interpreted as one- 
step (not infinite many step) formations of shared information by core- 
cursive definitions. As a result, we can provide a structure of inference 
between the actions, and define a process equivalence of dialogues with 
respect to their resulting shared information. 



1 Introduction 

This paper investigates formal semantics of acknowledgements^ agreements^ and 
disagreements as a formal basis for associating information sharing among com- 
municating agents, especially using dialogues. The notions of shared informa- 
tion^ mutual beliefs^ or common knowledge^ i.e., information contents shared in a 
group, play important roles in cooperative group actions or making joint inten- 
tions in a group (Cohen et al. [7,8]), and, conversely, making joint intentions re- 
quires agreements, i.e., making shared information as discussed in [7]^. However, 
shared information is usually modeled as infinite conjunctions of 
propositions with nested belief/knowledge operators like: Bel{a^p) A Bel{b^p)A 
Bel{b^Bel{a^p)) A Bel{a^Bel{b^p)) A ... as in [7]. Such a definition of shared 
information brings about problems concerning the finiteness of a process of in- 
formation sharing. On the other hand, shared information can be also modeled 
as circular objects such as the solution oi x = {x} using Hyperset Theory [1,2,4]. 
Circular objects have infinite information but have finite representations, so their 
operations are also defined finitely, and these definitions are called corecursive 
definitions [4]. This property helps to solve the problem of finiteness of acknowl- 
edgements [15]. Moves in communications or acts in group actions are formalized 

^ Although the Cohen- Levesque approach [7,8] is concerned with individual mental 
states, our framework does not concern individual mental states but only public 
information such as information about the gameboard in a multi-player game. 
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as maps from a bit of shared/private information to a bit of shared information. 
Acknowledgements, agreements, and disagreements are formalized as functions 
by corecursion. Although we can assume many paths of communication or group 
actions to shared information, each resulting states, i.e., shared information, is 
unique. Our semantics will assure such a property. 

In section 2, we will see agreements, acknowledgements, and disagreements 
from the point of view of information sharing. In section 3, we will see problems 
in the semantics of agreements, acknowledgements, and disagreements from the 
point of view of information sharing. In section 4, we will see a formal semantics 
of agreements, acknowledgements, and disagreements in terms of shared infor- 
mation based on Hyperset Theory. 



2 Acknowledgements, Agreements, and Disagreements 
from the Point of View of Information Sharing 

2.1 Dialogues and Information Sharing 

While dialogues and other types of discourse can be considered as acts involv- 
ing the agents’ information state transitions, the main distinctive property of 
communication is that the resulting information states of the agents of the com- 
munication can be divided into a state shared between the agents and unshared 
or private information. Only the former is available for observation based on 
sequences of messages used in a communication. Therefore, the semantics of di- 
alogues or communication should not be directly associated with the real mental 
states of agents (i.e., private information), but with information shared between 
the agents as the result of the dialogues. Some researchers have already pointed 
out the connection between natural language use and shared information or 
mutual beliefs (cf. Definite Reference [6]), but they mainly argue that shared 
information is a part of presuppositions of dialogues. However, the direct con- 
nection between each move of a dialogue and information sharing has not yet 
been discussed. The main focus of this paper is to clarify and formalize what 
are agreements, acknowledgements, and disagreements and which information 
possessed by one of the agents becomes public in the group of agents through 
the processes of dialogues. 



2.2 Acknowledgements, Agreements, Disagreements and 
Information Sharing 

Information which is possessed by one of a group of agents can be transmitted by 
his or her signal to the others via a channel shared among them. A dialogue can 
be regarded as an example of such a process. However, dialogues are not simple 
one-way transmission of information. In dialogues, we can observe, as in (1), 
some types of moves which are evidence for the proper statement that dialogues 
are two-way communication: acknolwedgement^ agreement and disagreement^ as 
discourse analysists describe. 
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(1) a. Claire: Does Bill have the three of spades? Max: Yes. (agreement) 

b. Claire: Bill has the three of spades. Max: Uh-huh. (acknowledgement) 

c. Claire: Bill has the three of spades. Max: I don^t think so. (disagreement) 

Therefore, when we try to propose a formal semantics of such moves or dialogues, 
even of more abstract two-way communication, it is natural that the semantics 
can be provided based on shared information between the communicating agents. 

Let us provide a formal semantics of such moves based on shared information. 
In (1), all the shared information in each dialogue is stated, respectively, as 
follows. 

(2) a. Claire and Max shared the information that Bill has the three of spades. 

b. Claire and Max shared the information that Claire thinks that Bill has 
the three of spades. 

c. Claire and Max shared the information that Claire thinks that Bill has 
the three of spades but Max doesn’t think so. 

On the other hand, each of the initial moves can be supposed to have the fol- 
lowing information. 

(3) Claire thinks that Bill has the three of spades. 

Therefore, in this case, we can provide a semantics of an agreement, an acknowl- 
edgement, and a disagreement as a function from (3) to (2a), a function from (3) 
to (2b), and a function from (3) to (2c), respectively. Thus, if we can formalize 
information described in (2) and (3), then we can give a formal semantics of 
agreements, acknowledgements, and disagreements. 

3 Problems of Semantics of Dialogues 

3.1 Equivalence of Dialogues 

The relation between dialogues and shared information allows for a notion of 
equivalence of dialogues as processes based on identification of the shared infor- 
mation that they give rise to. Each of the following dialogues is equivalent to a 
dialogue in (1) with respect to the shared information if their previous shared 
information is ignored.^ 

^ This is not an exhaustive list of equivalent dialogues with respect to shared infor- 
mation. For example, at least, dialogues which contain subdialogues triggered by 
moves such as repair moves [17], and so on. On formal semantics of subdialogues, 
see [16]. Furthermore, many other possibilites can be considered as follows, but the 
treatment of these cases is our future work: 

— Claire: You don’t think Bill has the three of spades. Max: Yes. Claire: But I think 
Bill has the three of spades. Max: Uh-huh. (=(lc)) 

— Max: You think Bill has the three of spades. Claire: That’s right. (=(lb)) 
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(4) a. Claire: Bill has the three of spades. Max: That’s right. (=(la)) 

b. Max: Does Bill have the three of spades? Claire: Yes. (=(la)) 

c. Max: Bill has the three of spades. Claire: That’s right. (=(la)) 

Therefore, we have many paths of dialogues in order to share information. This 
property assures us that semantics of dialogues provided from the viewpoint 
of information sharing must assure the equivalence of dialogues with respect to 
shared information and that simultaneously each dialogue must be distinguished 
from the others in some sense even if they are equivalent with respect to shared 
information. 

3.2 Compositionality of Shared Information Revisions 

If we adopt the viewpoint of a dialogue process as information sharing, the 
process tt of talking about a proposition can be modeled with the update 
function of the presupposed shared information as follows: 

Fj,{Sa,b^) = Sa,b{^ a '0). 

However, is a composition of the update function for each move rrii 
{i < n). Namely, 

-^7T ~ ^nriQ ^ -Cmi O . . . O FJ^n-i • 

This means that the definition of F^. is problematic. For example, while dialogue 
(5a) can be modeled as the function 

(5) a. Claire: Does Bill have the three of spades? Max: Yes. 

b- F(^^a){Sciaire,Max^) = Fciaire:Does-Bill.have-3^? ^ Fb-.Y es{S Clair e,Max^) 
~ F Clair e^MaxiSP A hciVc(^Bill ^ 3^)) 

we still need to define Fciaire:Does.Biii_have. 3^7 and FMax-.Yes- We can easily 
define any total dialogue tt as the update function F^^^ of shared information, 
but the update function F^ of each move m is not a direct update function of 
shared information. Therefore, if we associate processes of dialogues with shared 
information formed in the processes, we should define the meaning of each move 
in terms of information sharing. 

3.3 Finiteness of Acknowledgements 

The notion of shared information can be defined formally in many ways as 
in [2], in which shared information is formalized using Hyperset Theory with 
the framework of Situation Theory and classified into three classes of its defini- 
tions, and [10], in which many of its definitions within the framework of modal 
logic of knowldge are discussed, but there is still a question of the relevance of 
such a formal definition of shared information to the semantics of dialogues. One 
of the properties of dialogues, the finiteness of acknowledgements ^ requires the 
well-definedness of the notion of shared information. Normal dialogues finish in 
finite moves, as in (6a).^ 

^ Sa,bP means that A and B share the information Lp. 

^ Sometimes dialogues seem to continue a few steps as follows. 



36 



Norihiro Ogata 



(6) a. Claire: Bill has the three of spades. Max: Uh-huh. 

b. * Claire: Bill has the three of spades. Max: Uh-huh. Claire: Uh-huh. 

Uh-huh. ... 

However, if a move requires its receiver’s acknowledgement, an acknoweldgement 
also requires its receiver’s acknowledgement, since an acknowledgemt itself is a 
move. As a result, a dialogue couldn’t terminate, as in (6b), while normally every 
dialogue terminates. This problem is deeply relevant to the semantics of an ac- 
knowledgment. Suppose that an acknowledgement means informing the reception 
of a message, i.e., a function F : p Bel{x^ Bel{y^p))^ where p is the information 
content of the received message and x the utterer of the acknowledgement and y 
the sender of the message, every acknowledgement requires its acknowledgement 
by the other in order to achieve information sharing of the reception of the 
message, i.e., Bel{x, Bel{y^p)) A ... ABel{y^ Bel{x^ Bel{y ^ ..., Bel{y^p)...). Even if 
an acknowledgement means achieving information sharing of the reception of a 
message, i.e., a function G \ p ^ Sx,y{Bel{y^p)), the information Sx,y{Bel{y^p)) 
itself must be shared among the agents, and requires its acknowledement. There- 
fore, for any proposition g, propostion Sx^yQ must imply proposition Sx,ySx,yQ 
in order to terminate dialogues. 

According to Fagin et al. [10], shared information satisfies, at least, the Fixed 
Point Axiom (7a) and the Induction Rule (7b). 

(7) a. Sa,bP = BcI{A,Sa,bP Up) ABcI{B,Sa,bP Uif), 

h. p ^ Bel{A^ p Ajp) A Bel{B, p A'lp) implies p Sa,b"^- 

From a semantic point of view, the Fixed Point Axiom says that common knowl- 
edge or a mutual belief Sp between agent A and B can be viewed as a fixed point 
of the function Fp{X)^ and the Induction Rule says it satisfies the semantic con- 
dition such as X C Fp{X) ^ X C Sp, where Fp = XZ.BelA{pUZ)nBelB{pUZ), 
Belx is the denotation of the operator Bel{X, -).^ This condition constrains to 
view Sp as the greatest fixed point of Fp, i.e., gfp{Fp) = n«<c^ where F^^^ = 
Fp{F^). That is, by the two axioms, common knowledge or a mutual belief is 
the greatest fixed point of knowledge or belief opertors. Under the assumption 
that these axioms hold, Sx^yQ implies Sx,ySx,yQ^ which is required in the above. 
That is, this implies that it is not necessary that either agent acknowledge an 
acknowledgement. Therefore, the problem of the infinity of acknowledgements is 
avoided, and the necessity of the well-definedness of shared information is shown. 
The semantics of dialogues must be able to distinguish relevant public informa- 
tion from other information, and to define it as well-defined shared information 
in the sense that it satisfies the axioms of shared information. 

— Claire: Does Bill have the three of spades? Max: Yes. Claire: Yeah. 

— Claire: Does Bill have the three of spades? Max: Fes. Claire: Really. Max: Fea/i. 

Claire: Uh-huh. 

However, these examples can be analyzed as containing subdialogues with follow- 
up moves or check moves and so on. That is, acknowledgements or agreements are 
re- activated in some sense. 

^ This condition is called Coinduction Principle (See footnote 10 and [4]) 
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3.4 The Collapse Problem 

If we assume that the agent a’s agreement on sender 6’s proposition p means 
that a also believes that p, the resulting shared information state has only 
shared information of proposition Bel(a^p) ABel{b,p)^ since 6’s move means that 
Bel{b,p)^ but rather this agreement mean sharing proposition p. Komatsu [14,13] 
points out that there is such a gap between information obtained directly from 
agreements and shared information in dialogues containing an agreement, and 
we call this the Collapse Problem. For example, consider the following dialogue. 



(8) Claire: Bill has the three of spades. Max: Yeah, I think so, too. 

Claire’s move has information Bel {Claire^ have{Bill, 3^))^ and Max’s move “I 
think so” has information Bel{Max, have{Bill,3^)). Therefore, the shared in- 
formation as the result of (8) must be (9a). 

(9) a. Sciaire,Max{Bel{Claire^ have{Bill^ 3^) A Bel{Max^ have{Bill^ 3^))) 

b. (/mu6 (Bi// , 3^) ) 

However, (8) is a typical dialogue of an agreement, and its resulting shared in- 
formation must be (9b). Thus there is a gap of shared information between that 
obtained directly from each move and the result. We call the former agreement a 
weak agreement and the latter a strong agreement. To avoid the Collapse Prob- 
lem, Komatsu [14,13] proposes the Collapse Axiom which eliminates epistemic 
operators of shared contents as follows. 

(10) SA,B{Bel{A,p) A Bel{B,p)) implies Sa,b{p)- 

However, this is not a theorem deduced from the Fixed Point Axiom and the 
Induction Rules, but an additional axiom. This problem will be solved our formal 
semantics naturally. 

4 Situation Theoretic Modeling of Shared Information 

4.1 Situation Theory and Hyperset Theory 

Situation Theory [5,3,2] is a set-theoretic framework of information contents. The 
version of Situation Theory enhanced by Hyperset Theory [1,4] (i.e., ZFC minus 
the Foundation Axiom plus the Anti- Foundation Axiom) can handle eireular 
situations such as liars [3] and mutual beliefs [2], since Hyperset Theory can 
define circular objects such as a solution of an equation x = {x} by the Anti- 
Foundation Axiom. More formally, given a set of atoms lA^ an equational system 
of sets £ is defined as follows. 

® Undoubtedly, an utterance of content p has such information, say Bel(a,p). This 
information content is not intended to mean that a believes p, but intended to mean 
that information content p can be attributed to a. That is, whatever a believes, a’s 
utterance p has information Bel{a,p). 
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Definition 1 (Barwise & Moss [4]). 

1. A flaf equational system of sets (FESS) is a triple E = (X, A^e), where X 
and A are sets of urelements sueh that X H A = 0, and a funetion e : X ^ 
pow{X U A). 

2. X is ealled the set of indeterminates ofS, and A is ealled the set of atoms 
ofE. 

3. A solution to £ is a funetion 6 with domain X satisfying 0{x) = {0{y)\y G 
e{x) n X} U {e{x) H A), for eaeh x ^ X . 

Using the concept of FESS, we can state a form of Anti-Foundation Axiom as 
follows: 

Anti-Foundation Axiom: Every FESS has a unique solution 0 . 

Let Solutions et{£) be the set {0{x)\x G X} where £ = (X, A, e). We can define 
the hyperuniverse Va/a[^] follows. 

Ua/a[^] = \^{SolutionSet{£)\£ is a FESS with atoms A C ZY}. 

As we will see later, mutual beliefs or shared information are also defined as 
circular objects, i.e., finite objects, but not as infinite objects as in [7]. Firstly 
we will define basic objects in Situation Theory as follows. 

Definition 2. Let R U AG U CARD U / U {type^ of Type} be a set of atoms, 
where 

— R = {H, Bel} are relations which mean having and believing respectively 

— AG = {Max, Claire, Bill} is a set of agents, 

— CARD = {24, •••, A^} is a set of cards, 

— I = {1,0} is a set of polarities, i.e., 1 means true and 0 false, 

and let SO A, the class of states of affairs^ , SIT , the class of situations, TYPE, 
the class of types, and PROP, the class of propositions^ be the largest classes^^ 
satisfying the following conditions: 



^ If e is a function from indeterminates X to pow(A U X), then the system is called 
flat, while if e is a function from indeterminates X to any set constructed from A 
and X basically, the system is called general. For example, {x = {a,x)} is general, 
but one of its equivalents {x = {y, z}, y = {a}, z = {a, x}} is flat. 

^ A state of affairs is called a faet if an actual world contains it. See [3] p.75. 

^ Strictly speaking, these propositions are called Austinean [3]. 

This definition use the condition “the largest class.” Such a definition is a coinductive 
definition, and objects over the hyperuniverse are defined by coinduction. While 
an inductive definition can be viewed as constructing the least fixed point of a 
monotonic functor F, that is, lfp{F) = p|{X|F(X) C X}, a coinductive definition 
can be viewed as constructing the greatest fixed point of a monotonic functor F, that 
is, gfp{F) = U{X|X C F(X)}. Therefore, x G gfp{F) is shown by the Coinductive 
Principle for F, i.e., it must be shown that {x} gfp{F) C F({x} gfp{F)). See 
Barwise & Moss [4]. 
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— If a e SO A, then a is a tuple a, c, i) or {Bel^ a, 5, i), where a G AG, c G 
CARD, i e I, and s G SIT; if {H, a, c, 1) is in situation s, this represents a ^s 
having c is in s, if {H, a, c, 0) is in s, this a ^s not having c in s, if {Bel, a,s',l) 
is in s, this a^s believing s' in s, and if {Bel, a, s' ,0) is in s, this a^s not 
believing s' in s. 

— If se SIT, then s C SO A, 

— If T ^ TYPE, then T = {type, a), written [a], where a G SO A, 

— If p G PROP, then p = {ofType, s,T) (s is of type T), written {s : T), 
where s G SIT and T G TYPE. 

The class {{H,a,c,i)\a G AG,c G CARD,i G {1,0}} is called BSOA (basic 
state of affairs). A proposition {s : [cr]) is said to be true if a ^ s. 

A revision of circularity itself can be defined by a substitution which can be 
defined by corecursion. 

Definition 3 (Barwise & Moss [4]). A substitution is a function 0 whose 
domain is a set of urelements. A substitution operation is an operation sub 
whose domain consists of a class of pairs {0, b) where 9 is a substitution and 
b e U U Vafai^], such that the following conditions are met. 

1. If X ^ dom{9), then sub{9,x) = 9{x). 

2. If X ^ U\dom{6), then sub{0,x) = x. 

3. For all sets b, sub{9,h) = {sub{0,a)\a G b}. 

Barwise & Moss [4] has shown the existence and uniqueness of sub. 

Example 1. A substitution 9h as the revision function required in (11) are defined 
by corecursion as follows: 



0h{u) = 0h{u) U {b}. 



for all indeterminate u. 

(11) {x = {a, x}} {x = {a, b, x}} 

That is, 0b{x) = 0h{x) U {b} = {0b{x), a} U {b} = {0b{x),a, b}. 

The existence and uniqueness of the solution to the equation 0b{x) = {9i){x),a, b} 
are guaranteed by the Anti-Foundation Axiom. 



4.2 Modeling Shared Information 

Shared information or mutual beliefs can be considered as circular objects in 
Situation Theory. Barwise [2] compares three approaches to modeling of shared 

Barwise et al. [3,2] apply Hyperset Theory to the construction of circular proposi- 
tions. For example, the so-called liar sentences, e.g., (a): (a) is false, are regarded as 
expressing a proposition {s : [Tr,p, 0]) for a situation s where each sentence is used, 
and p a solution to p = {s : [Tr,p, 0]). Standard semantic theories can only treat liar 
sentences as having undefined semantics. 
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information, and two of them ((ii) and (iii)) are defined by hypersets, while one 
((i)) is an infinite set. For example, the shared information bg shared between a 
and b that a’s having the three of spades at situation s is defined in each approach 
as follows: 

(i) the Iterate^^: sj = Ua<u; where = {{Bel, a, 1), {Bel, a, 1)}, 
and sq = {{H, a, 34, 1)}; 

(ii) the Fixed Point: sp = {{Bel, a, {{H, a, 34, 1)} U sf, 1), 

{Bel, a, {{H, a, 34, 1)} U sp, 1)},; 

(iii) the Shared- Situation: ss = {{Bel, a, 1), {Bel, b, 1), {H, a, 34, 1)}-^^ 

We will adopt the Shared- Situation approach here, since it formalizes shared 
information as finite objects, and the Fixed Point approach, which can similarly 
formalize them as finite objects, can be considered a distributed information of 
the shared information. Namely, the Fixed Point situation sp can be considered 
the solution to sp = {{Bel, a, t, 1), {Bel, a, t, 1)}, and t = {{H, a, 34, 1)} ^ sp, 
i.e., t = {{H, a, 34, 1), {Bel, a, t, 1), {Bel, a, t, 1)}. Therefore, 

Sp = {{Bel, a, ss, 1), {Bel, a, ss, !)}• 

Thus, the intrinsic circularity of shared information is expressed by the Shared 
Situation 

We introduce the notion of an equational system of situations {ESS), which 
is basically an equational system of sets, expressed as a tuple 6 = {S, A, P,e,s), 
where 5 is a set of indeterminates, A a set of agents, P C BSOA, s ^ S (called 
the root of E), e : S ^ P{P U S), and P : x {{Bel, x, s, i)|s ^pow{x),x G A, 
i G {1,0}} U BSOA. 

For example, ss can be considered as a solution to an equational system of 
situations: 

({s}, {a, a, 34, 1)}, {(s, {(Bel, a, s, 1), (Bel, b, s, 1), (H, a, 34, 1)})}. 

Ss can be unfolded infinitely as follows: 




This approach is basically equivalent to Cohen et al. [7,8] 

[3] proposes a slightly different formalization as the solution of the equation x = (s : 
[[H, a, 34 , 1] A [Bel, a, x, 1] A [Bel, b, x, 1]]). 

Furthermore we must introduce a notion of coherence of models of shared infor- 
mation to avoid such an incoherent shared information as: (i) A and B share the 
information that A and B share no information; (ii) A and B share the information 
that A and B don’t share (ii); and so on [15]. For simplicity of the argument here, 
discussion of this problem is omitted. 
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This property grasps the essence of shared information, which, in modal logic 

approaches [10] to shared information or common knowledge, is axiomatized as 
the Fixed Point Axiom (7a) and the Induction Rule (7b). 

4.3 Modeling Resulting Information States 

We have seen a connection between moves such as acknowledgements, agree- 
ments, and disagreements and information sharing in (1) and (2). Moreover, we 
have also seen a theoretical distinction of weak agreements and strong agree- 
ments. The resulting shared information in (1), i.e., informally written in (2), is 
formalized as follows, where Sack is the shared information by an acknowledge- 
ment (lb), Ss_agr tbc shared information by a strong agreement in (la), Sw_agr 
the shared information by a weak agreement in (la), and Sdis the shared infor- 
mation by a disagreement. 

(12) a. Sack = {{Bel,C,SackA),{Bel,M,SackA),{Bel,C,{a},l)} 

b. Ss_agr — e, Ss^gr j 1 ) 5 (Hc/, Ad , Ss_agr-, 1) ; 

C- Syj_agr — {(-Se/, (7, Sw-agr-) 1)? A7, Sw-agr-) 1)? (7, {ct]-, 1), 

(5e/,M, M,0)} 

d. Sdis = {{Bel, C, Sdis, 1), {Bel, M, Sdis, 1), {Bel, C, {a}, 1), 
{Bel,M,{a},0)} 

where C = Claire, M = Max, a = {H, Bill, 34, 1). 

These are not denotations of agreements, acknowledgements, and disagree- 
ments, but denotations of sequences in (1). Therefore, in order to define the 
denotation of agreements, acknowledgements, and disagreements, we must con- 
sider the denotations of the initial moves in (1), i.e., 

(13) a. Claire: Does Bill have the three of spades? 
b. Claire: Bill has the three of spades. 

Usually (13a) and (13b) are interpreted as the denotation of a question, e.g., a 
set of the possible answers of the question [12], and the denotation of a sentence, 
i.e., a propositon, respectively. However, in dialogues, from the viewpoint of 
information sharing, they are not a sentence or question but actions such as 
an assertion and a query in group acts. Furthermore, sometimes they have no 
denotation with respect to information sharing as in the following dialogues. 

(14) a. Claire: Does Bill have the three of spades? Max: What? 
b. Claire: Bill has the three of spades. Max: Huh? 

Their information is not shared until the hearer replies. Therefore, as discussed 
in (3), we assume that they have the following information: 

(12) Claire thinks that Bill has the three of spades. 

This information was unshared in Claire and Max’s shared information.^^ (12) 
is formalized as follows: 



15 



This information doesn’t necessarily reflect Claire’s true mental state but only in- 
formation. 
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(13) 51 = {(5e/,C,5i,l),(j}, 

since the information of (3) is self-knowledge, and can be “unfolded” as follows: 

Si = {(5e/,C, si,l),cr}, 

= {{Bel, C, {{Bel, C, si, 1), a}, 1), a}, 

= {{Bel, C, {{Bel, C, {{Bel, C, {{Bel, C, si, l),a}, l),a}, l),a}, l),a}. 



The moves such as agreements, acknowledgements, and disagreements can be 
given their denotations as functions from (13) to (12). We will see about the def- 
initions in section 4.5. Before that discussion, we will consider relations between 
such bits of circular information in the next section. 



4.4 Ordering of Shared Information 

Definition 4 (Barwise [2]). The hereditary subsituation relation □ is the 
largest relation on SIT x SIT satisfying: si T S 2 

— if {H,x,c,i) G si, then {H,x,c,i) G si, and 

— if {Bel, x,ti,l) G si, then there is t 2 such that t\ □ t 2 and 
{Bel,x,t 2 ,l) ^ ' 52 - 

Proposition 1. Let s^ = lfp{F) where 

F(0) = {{Bel,C,{(j},T)} andF{X) = {{Bel,C,X,T),{Bel,M,X,T)}, 



where lfp{F) means the least fixed point of F . Then the followings holds. 



(1) s T sUt, for any non-empty situation s and t. 

(2) S^ T Sack E S^_agr E Sg_agr- 

(3) So; T Sack ^ Sdig ^ Sg_agr • 



( 4 ) 



^s-agr 






w-agr • 



Proof. (1) Since if si C 52 , then si T <52, (1) holds, {s^; E Sack) By Kleene’s 
thereorem [9], s^; can also be defined by recursion as follows: 



5o = {{Bel, C, {a}, 1)}, 5c,+i 



a<(jj 



Since {Bel,C, {a},l) G Sack, <5o E Sack- Suppose E Sack- To show 5^+1 E 
Sack, Sa T Sack must be showu, but this is the induction hypothesis. There- 
fore, Soj E Sack is shown by induction, {sack E Sw_agr) It is shown by coinduction 
(See footnote 10 and [4]). Suppose R = {{sack, Sw^agr)}^ Then R satisfies 
the conditions of definition 4, that is, {Bel, x. Sack, 1), {Bel, C, {cr}, 1) G Sack and 
{Bel, X, Sqjj_agr, ^ Sw-agr and {Sack , Sw-agr e R, 

for X G {C,M}. However, since T is the largest relation satisfying the con- 
ditions of definition 4, R CC, i.e., R =T. Therefore, {sack, Sw^agr) Sim- 
ilarly, {sack,Sdis) {sdis E Sg_agr) It is also showu by coinduction. Sup- 
pose R = 55_a^^)}U T. Then R satisfies the conditions of definition 4, 
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that is, (5e/, a:, 1), (5e/, (7, {cr}, 1) G Sdis S'Hd {Bel^ Ss_agr^^) ^ ^s^agr 

and ({cr}, G for x G {(7, M}. However, since C is the 

largest relation satisfying the conditions of definition 4, R CC, i.e., R =C. 
Therefore, {sdis, Ss_agr) Similarly, {sy,_agr, Ss_agr) (4) Suppose R = 
{{ss_agr^ Sw_agr)}^ E- But R docsn’t Satisfy the conditions of definition 4, that 
is, (J G Sg_agr but (J ^ S<nj_agr- 7 

As Barwise [2] points out, □ reflects the opposite ordering of the informativity 
of situations. 

Definition 5 (Barwise [2]). The relation ^ is the largest subclass of SIT x 
SO A satisfying the following conditions: 

— s ^ (i7, x^ c, i) iff (i7, x^ c, i) G s for i G {1, 0}, 

— s ^ {Bel^ x^ s, 1) iff there is an si such that {Bel^ x^ 1) ^ s, and for each 

(Jo G s, Si ^ (Jo, for X G AG. 

We extend nour notation and write si |= S2 provided si ^ Jo for each (Jo G S2- 
Proposition 2 (Barwise [2]). For all situations so and si, si □ S 2 iff S 2 H 
Proof. See proposition l-(5) in [2].H 

Therefore, Sg_agr implies Sw_agr- But by proposition l-(4), the reverse doesn’t 
hold. That is, the Collapse Axiom discussed in section 3.4 doesn’t hold. So, we 
select strong agreement as resulting states of agreement and don’t need the Col- 
lapse Axiom, and weak agreement is deduced from strong agreement by inference. 
However, Sdis E Sg_agr implies that strong agreement implies disagreement. This 
must be avoided. So we revise definition 4 as follows. 

Definition 6. The hereditary subsituation relation C is the largest relation on 
SIT X SIT satisfying: si T S2 

— if {H^x^cA) ^ si, then {H^x^cA) ^ and 

— if {Bel^ x^ tiA) ^ si, then there is t 2 such that t\ T t 2 and {Bel^ x^ 1^2 A) ^ <^ 2 ? 
for i G {1,0} and x G AG. 

Proposition 3. Sdis 2 Ss_agr- 

Proof. Suppose R = <Ss_a^^)}U T. But R doesn’t satisfy the conditions 

of definition 4, that is, (He/, M, {a}, 0) G Sdis but there is no s' such that 
{Bel, s' ^0) G Ss_agr-~^ 

To sum up, our semantics has the following natural properties: 

— Strong agreement implies weak agreement. 

— Weak agreement and disagreement imply acknowledgement. 

— The Collapse Axiom doesn’t hold. 
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4.5 Semantics of Moves in Dialogues 

We will give a semantics of moves based on information sharing to only a small 
fragment of dialogues, which is defined as follows. 

X G AG c G CARD 

Ini ::= Does x have c!\x has d, 

Rep ::= Yes|No|Uh-huh, D ::= Ini; Rep, 

where D is a small dialogue and ‘Yes’ means an agreement, ‘No’ a disagreement, 
‘Uh-huh’ an acknowledgement, of which denotations are functions from (13) to 
(12) defined by corecursion as definition 3. 

Definition 7. is a function from dialogues or moves and situations to 

their denotations, where y is the speaker and z is the hearer, satisfying the fol- 
lowing conditions: 

- [Ini;Rep]^y’^^ = [i^ep]^^’^)’^^^^^)([/m]^^’^)(0)) 

- [Does X have cl\^y^^\s) = sGt, where t = {{Bel, y, t, 1), {H, x, c, 1)}, and 
C{Does X have cl) = {H, x, c, 1), 

- [x has c!]^^’^^(s) = sUt, where t = {{Bel, y,tU {{Bel, z, {{H, x, c, 1)}, 1)}, 
1)}, and C{x have c\) = {H, x, c, 1), 

- |Ye5]^^’^)’^(5) = lYes\^y^^">^y{s)G{{Bel,y,{Yesfy^^">^y{s),l)} 

- [Uh-huh\^y^^^^y{s) = {Uh-huh\^y^^^^y{s) U {{Bel,y, [Uh-huhfy^^^'y{s),l)}, 
and [Uh-huh]^y^^^^y{{p}) = {{Bel, z, {p},!)} 

- [Noj^y’^^’P{s) = lNoj(y’^)’P{s) U {{Bd,y,[Nol(y’^\s),l),{Bel,y,{p},0)}, 

and = {{Bel,z,{p},l)} 

Verification. 



[Does X have c?; Yes]^^’^^ 

= '^■^(IDoes * have c?]<^’^)(0) 

= |Yes]*'^’*'^’^(t) where t = {{Bel,y,t, l)iP} 

= lYesl(^’«)’P(i) U {{Bel, z, [Yesl‘^’«)’^(t), 1)} 

= {{Bel,y,lYest’y^’P{t),l),p, {Bel,z, [YesI^^’^^'^Ct), 1)} 

where p = {H, x,c,l). This is the resulting state of an agreement. 

[x have d; Uh-huh] 

= [Uh-huhJ^^’^^’^^^^ have have d|^^’^^(0) 

= [Uh-huh|^^’^^’^(t) where t = {{Bel,y,t, l),p} 

= [Uh-huh|^^’^^’^(t) U {{Bel,z, [Uh-huh] 1)} 

= {{Bel, y, [Uh-huh](^’")'P(t), 1), {Bel, y, {p}, 1), {Bel, z, [Yesl(^’«)’'’(t), 1)} 
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where p = (i/, a;, c, 1). This is the resulting state of an acknowledgement. 

[Does X have c?;No]^^’^^ 

= =')(|Does X have 

= where t = {{Bel,y,t, l),p} 

= U {{Bel, z, 1), {Bel,z, {p},0)} 

= {{Bel, V, 1), {Bel, y, {p}, 1), {Bel, z, [Yesl^^’^^'^Ct), 1), {Bel, z, {p}, 0)} 

where p = {H, 1). This is the resulting state of a disagreement. H 

Definition 8 . Let Di,D2 he dialgoues. Di = D2 iff □ [£>2]^^’^^ and 

Our semantics has the property discussed in 3.1 as follows. 

Proposition 4 . Does x have cly] YeSz = x has cly; YeSz = Does x have cl z] 
Yesy = X has clz’, YeSy 

Proof. Directly from definition 7 and 8.H 

5 Conclusion 

We have seen a formal and compositional semantics of agreements, acknowl- 
edgements, and disagreements based on information sharing. Our main claim is 
that these communicative actions bear not only information of the actors’ infor- 
mation, but also form shared information. That is, in a communication among 
agents, this semantics interprets agreements as update functions from message 
senders’ beliefs to the shared belief of the content of the message among the 
agents, acknowledgements as update functions from message senders’ beliefs to 
the shared belief of his/her belief among the agents, and disagreements as update 
functions from message senders’ beliefs to the shared belief of the contradictory 
beliefs among the agents. If we consider only the former part of information of 
these actions, we will meet the Collapse Problem as discussed in section 3.4. 
Such an update of shared beliefs also involves the problem of the finiteness of 
the actions as seen in section 3. This semantics avoids such problems due to 
the formalization of shared beliefs as circular finite objects and their corecursive 
definition. It is shown that this semantics can guarantee the inferential rela- 
tion between acknowledgements, agreements, and disagreements and a process 
equivalence among dialogues with respect to resulting shared beliefs. 

However, although the finiteness of the acknowledgements is related to the 
notorious problem, the Byzantine Agreement Problem of distributed systems [10], 
our semantics isn’t concerned with the point of how to achieve agreements in 
a distributed system, but focuses on what achieving agreement means, so the 
clarification of this semantics’ connection with the Byzantine Agreement Prob- 
lem still remains unsolved. Another issue that remains unsolved is the problem 
of how to form fundamental agreements such as Conversation Policies [II], i.e.. 
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sharing how to communicate among agents, which can be considered as a kind 
of agreements. Full treatment is also our future work. 
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Abstract. The optimal decision for an agent in a given game situation depends 
on the decisions of other agents at the same time. Rational agents will find a 
stable equilibrium before taking an action, according to the assumption of 
rationality. We suggest that the rational agents can use the negotiation 
mechanism to reach the equilibrium. In previous works, we proposed the 
communication actions of guarantee and compensation to convince or persuade 
other agents with a trusted third party mediating the games. In this paper, we 
extend the negotiation mechanism to deal with n-by-n games and justify its 
optimality with the underlying assumptions. During the negotiation process, 
each agent makes suggestions on how they can reach equilibrium while 
maximizing its own payoff The mechanism can deal with all the game 
situations and find an acceptable equilibrium that gives optimal payoffs for the 
agents. 



1. Introduction 

In a multi-agent community, the result of a certain action of an agent depends on how 
other agents act. Therefore, each agent must model other agents’ decision in order to 
find the best strategy to get the optimal outcome. Game theory provides a way to 
model and reason about this situation. A game matrix may represent each different 
outcome for all combination of strategies. According to the game theory, there are 
games that give a unique stable equilibrium, but also some that gives no stable 
equilibrium at all. Besides, multiple equilibria and prisoner’s dilemma are two other 
well known difficult game situations. If the game payoffs are transferable, the 
negotiation protocol may help agents to solve the difficult games by changing the 
game matrix. If the game has no stable equilibrium point, agents should create one, 
beside the payoffs of the equilibrium should be acceptable. Reaching an agreement on 
certain equilibrium point is very important, since the agents may not always keep the 
commitment, changing the game matrix is a way to enforce the commitment. The 
stable equilibrium is often known as Nash equilibrium, therefore in the following text 
of the paper we treat both terms equivalently. In difficult game situations, agents 
should try to negotiate and change the original difficult game situation into a new 
game situation, in which an acceptable Nash equilibrium exists. For example, the 

H. Nakashima, C. Zhang (Eds.): PRIMA99, LNAI 1733, pp. 47-61, 1999. 
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Fig. 1. The n-by-n game matrix that models a two-agent deeision-making situation 
where each agent has three strategies, (a) The game matrix without Nash 
equilibrium, (b) The game matrix with multiple Nash equilibria 



game matrix in Fig. 1 (a) is a game matrix without Nash equilibrium and the game 
matrix in Fig. 1 (b) is a game matrix with multiple Nash equilibria. Agents in these 
games need to have some negotiation mechanisms to coordinate in order to reach a 
stable equilibrium point that every one agrees. Traditional game theory assumes that 
there is no communication and both agents are mutual rational, and then they can find 
a mixed strategy Nash equilibrium. In this paper, we will show how the negotiation 
mechanism can help the agents to find a pure strategy that gives no fewer payoffs than 
the mixed strategy. 

In previous works [26] [28], we proposed a negotiation protocol that involved a 
trusted third party. The mechanism can deal all the three difficult game situations in 
2-by-2 games: no equilibrium, multiple equilibria and Prisoner’s dilemma games. The 
mechanism is based on some assumptions. These assumptions can be justified as 
rational, so that agents that adopt the are rational. In this paper, we explore the result 
of negotiation in different n-by-n game situations. An n-by-n game can model more 
strategies for each agent, some high level strategy such as “not to play the game”, 
“delay the decision” can also modeled into one game. This force the agent must do a 
rational decision at this time. This is quit different from the 2-by-2 game. 

Multi-agent coordination is necessary for a multi-agent community to prevent an 
anarchy or chaos [5] [8] [13] [2 1]. Negotiation is a way to coordinate rational agents 
under a game theoretical deal-making mechanism [7] [17] [18] [22]. Game theory can 
be used to guide the negotiation [16] [30] [31] [32]. The underlying assumption is that 
agents model each other as rational agents and making decision base on the game 
theory [1] [2] [9] [20] [23]. Leveled commitment is a way to bind the commitment in 
multi-agent contracting, which allows punishment for the de-commitment in the 
future [19]. If there are different types of agents, a recursive modeling method may 
help in decision-making [4] [6] [24] [25]. Issues related to uncertain games are usually 
not well addressed [29], fuzzy theory is a way to deal with uncertainty in a fuzzy 
game [27]. 
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Section 2 is the definition of negotiation games and the negotiation mechanisms. 
Section 3 is how to apply the negotiation in n-by-n games. Section 4 shows a scenario 
of negotiation to change equilibrium point in an n-by-n game. Section 5 is some 
discussion on the result and in section 6 is the conclusion. 



2. Negotiation Games 

2.1 The Negotiation Game 

A negotiation game is a traditional game with a negotiation mechanism. Agents in the 
game can analyze the game and try to find out equilibrium before taking any real 
action. If no acceptable equilibrium can be found in the original game, the agents will 
try to create one by the negotiation mechanism. 



Definition (A Negotiation Game) 

A one-shot negotiation game can be defined as a tuple <A, S, P, N>. Where Q} 

is a set of agent, S is the set of strategy set for each of agent in A, S={Sp, Sq}, Sp={Pj, 
P2,..., Pn}, Sq={Qi, Q2, Qn}}, Pp.'SpXSg^R, Pq.'SpXSq^R is the payoff functions 

that maps each combination of strategies to a payoff value for each agent and A is a 
negotiation mechanism. The payoff functions of each agent on each combination of 
strategies form a game matrix. Agents may use the negotiation mechanism to alter the 
game matrix and find an acceptable equilibrium. 

The negotiation mechanism is an external part of the game, under different situation, 
the mechanism must be reconstruct. We proposed a mechanism that included two 
communication actions and a trusted third party. The two communication actions, 
guarantee and compensation, provide a way to trade payoffs and therefore change the 
game matrix from a difficult game into a desirable game. 



2.2 The Role of the Trusted Third Party 

The basic idea of our work is that agents may trade the payoffs for a better outcome. 
This is very similar to the everyday bargain. The existence of a trusted third party is 
very important. Since agents are assumed rational, a rational agent acts for its own 
interest only and is an expected payoff maximizer. There are game situations that can 
not be improved without a trusted third party. For example, in a prisoner’s dilemma 
game, only the communication between the two agents can help to escape the 
dilemma. Even the agents tell each other that they will play the cooperative strategy, 
since each agent will try to maximize the payoff, the agents will play the defect 
strategy for each agent’s best payoff. 
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Fig. 2. The change of Nash equilibrium due to the guarantee and compensation 
communication actions. (a),(c) The original game matrix, (b) Agent P pays a guarantee 
2 on not to play P2. (d) Agent P offers a compensation 2 to agent Q on playing Q1 . 



In real world, the trusted third is neeessary in many trade proeesses. If a trade 
process can not be finished at once and there is a possibility to play defect strategy, 
then there is a need of the trusted third party. For example, banks often play the role 
as the trusted third party in the selling and buying process of the estate. The trusted 
third party plays a role in the trade but the trusted third party is neither the seller nor 
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the buyer. The trusted third ean ensure the eommitment of both agents to a trade with 
a domain independent manner. 

In our work, the trusted third party provides additional help for the enforcement of 
commitment. In non-cooperative games, difficult game situations can provide a self- 
enforcement of making a cooperative decision. Agents use the communication actions 
to alter the difficult game situations into better game situations, and this mechanism 
can not work without a trusted party. 



2.3 The Guarantee, Compensation Communication Actions 

In our previous work, we defined two communication actions that can alter the game 
matrix, so that the game can be changed from a difficult game into one with an 
acceptable stable equilibrium. The two communication actions: guarantee and 
compensation that can help to coordinate rational agents [26] [28]. The mechanism 
involves a trusted third party. 

The guarantee communication action is a way to prevent an agent from playing a 
strategy that will lead to a worse result for another agent. As in Fig. 3 (a), if a strategy 

51 of agent Q may lead to a less payoff result for the agent P, then the agent P may 
ask agent Q to deposit a guarantee G at the trusted third party. The guarantee forbids 
agent Q to play SI. If agent Q agrees and deposits the guarantee at the trusted third 
party, then the trusted third will see if agent keep the commitment or not to return the 
guarantee or not. If agent Q plays SI, the guarantee G will not be returned. If agent Q 
keeps the commitment, the guarantee will be returned. Therefore, if agent deposits the 
guarantee, then the game matrix for agent Q is changed and the payoffs associated to 
the strategy SI decrease by G. The guarantee communication action in the n-by-n 
game is different from that in 2-by-2 games. In a 2-by-2 game, the guarantee for 
playing one strategy implies not to play another strategy. In the n-by-n game, 
however, the guarantee of playing a certain strategy must define in terms of not 
playing many other strategies. This is called the multiple forbidden property. This 
multiple forbidden property can be achieved by indicating the list of forbidden 
strategies. The effect of a guarantee action changes the game matrix. As in Fig. 2 (a) 
and (b), when an agent P pays a guarantee G for a certain strategy that will cause 
the payoffs of the strategy remain the same but the payoffs of the other strategies 
decrease. This means that all the strategy combinations of the other strategies will be 
changed. For all of the agent Q’s strategies Sq and all of the agent P’s strategies Sp' 
other than the certain Sp^, Pay off p (Sp\ Sq) become Pay off p (Sp\ Sq) - G. 

Compensation communication action, on the other hand, is used to persuade an 
agent to play certain strategy that can lead to a desirable state. As in Fig. 3 (b), if a 
strategy S2 of agent Q may lead to a better payoff for the agent P, then the agent P 
may offer some compensation C to persuade the agent Q for playing strategy S2. The 
compensation is deposited at the trusted third party temporally. The compensation 
will be sent to the agent Q if the agent Q does play the asked strategy S2. If agent Q 
accepts the offer, we can say the game matrix is changed. The payoffs associated to 

52 for agent P decreases C and the payoffs associated to S2 for agent Q increases C. 
With this negotiation protocol, the rational agents can reach a Pareto-efficient and 
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(a) The guarantee communication action 




(b) The compensation communication action 



Fig. 3. (a) The guarantee and (b) the eompensation eommunication aetions are used 
together with a trusted third party in negotiation. 



Nash equilibrium in all 2-by-2 games. There is a constraint called the proper quantum 
principle on the minimal amount of compensation that ensures the negotiation will 
end within finite time. The effect of a compensation action changes the game matrix. 
As in Fig. 2(c) and (d), when an agent P pays a compensation C for a certain strategy 
that will cause the payoffs of the strategy changed but the payoffs of the 
other strategies remain the same. This means that all the strategy combinations of the 
Sq^ strategy will be changed. For the agent Q’s strategies Sq^ and all of the agent P’s 
strategies Sp' , Pay off p (Sp\ Sq^) become Pay off p (Sp\ Sq"^) - C and Pay off q (Sp\ 
Sq^) become Pay off q (Sp\ Sq^) + C. 
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Proper quantum principle: The minimal amount of the compensation is called a 
quantum. Since the basic quantum of payoff may not exist in general cases, any small 
but significant enough amount can be accepted at the first time. This principle implies 
that the next compensation should not be less than the amount that another agent 
previously offered. This principle is necessary to prevent that one agent may offer so 
small compensation that causes a lengthy negotiation process. 



2.4 The General Negotiation Protocol 

The protocol is symmetric, one agent makes a suggestion then the other agent accepts 
it or makes another counter suggestion. There could be many different ways to make a 
suggestion and different criteria of whether to accept a suggestion or not. We will 
give our approach and discuss later. The general protocol of the negotiation is: 

Procedure: Negotiation Protocol 

Input: the game matrix without an acceptable stable equilibrium 
Output: a game matrix with an acceptable stable equilibrium 
Step 1. Construct the game matrix 

Then enter the making suggestion, accepting or making counter suggestion loop 
Step 2. Make a suggestion to form an acceptable equilibrium point using guarantee 
and/or compensation communication actions based on the mutual rational 
assumption. 

Step 3. Another agent will decide whether to accept it or not under some criteria. 
Step 4. If there are better suggestions then go to step 2. 

Step 5. If there is no better suggestion then accept the last suggestion. 

End of the Protocol 

Different criteria for the agents to make, accept, reject suggestion may cause 
different result of the negotiation. Here we provide a set of rules to make suggestion 
and decide whether to reject or accept a suggestion. First, agent form a suggestion by 
picking up the combination of strategies that gives the highest total payoffs. The 
suggestion includes the associated guarantee or compensation actions that will make 
this state a Nash equilibrium point. When agents offer the compensation, they follow 
the assumption of proper quantum principle. Second, the agent accepts only the 
suggestion that gives higher payoff then its own suggestion and rejects other 
suggestion. These rules imply that the negotiated result must be Nash equilibrium and 
which gives the highest total payoffs while each agent seeking its own interesting. 
These rules can escape the prisoner’s dilemma and end up with the same payoffs in 
multiple Nash equilibrium games, no matter which agent initiate the negotiation [28]. 
In this paper, we don’t limit the agent to these rules, since different rules that take 
different preference in to consideration may have different result. For example, if 
finish the negotiation as soon as possible when seeking a stable and acceptable 
equilibrium point is the most important job, accept the first suggestion is a possible 
rule. If the two agents have different eagerness of seeking the final negotiated result, 
the negotiation may have different result on who initiate the negotiation. 
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3. Applying the Negotiation Mechanism to N-by-N Games 

This section describes how the negotiation mechanism works in different n-by-n 
games. 



3.1 Escape from the Prisoner’s Dilemma in an N-by-N Game 

The prisoner’s dilemma (PD) is a special game in game theory literature [14]. Since 
many phenomena can be modeled by the PD game [1], such as the multi-agent 
coordination [10] [11]. In a PD game, each agent will choose its own dominant 
strategy but the outcome (combination of the strategies) will lead to a worst result. In 
a 2-by-2 prisoner’s dilemma game, agents will ask for a guarantee of not playing the 
defect strategy but playing the cooperative strategy. Thus, the agents can escape the 
dilemma and reach a new equilibrium that gives more payoffs for both agents. 

A prisoner’s dilemma game involves two agents, each agent has two strategies to 
play: the cooperative and the defect strategies. The payoffs of the combination of 
(cooperative, cooperative), (cooperative, defect), (defect, cooperative), (defect, 
defect) are (a, a), (b, c), (c, b) and (d, d) respectively. Where c> a>d>b. A standard 
prisoner’s dilemma game requires 2a > b+c. And the cases in which 2a < b+c are also 
called prisoner’s dilemma. In a prisoner’s dilemma game, the agents will play the 
dominant strategy (defect, defect) since c>a and d >b and get the payoffs (d, d) while 
bypass the better payoffs (a, a). 

With a trusted third party as an intermediary agent, using the guarantee 
communication action alone is enough for the agents to escape from a standard 
prisoner’s dilemma in an 2-by-2 game. The agents can simply deposit the guarantee 
of not playing defect strategy at the trusted third party. It will reduce the payoff of 
playing the defect strategy significantly enough that make defect no longer a 
dominant strategy. Then combination of (defect, defect) strategy is no longer a Nash 
equilibrium point, and thus the agents will find another equilibrium eventually. After 
agents paying the guarantee, there could be several new Nash equilibria that will have 
higher payoffs than the original dilemma Nash equilibrium. 

If more choices of strategies instead of two are available for each agent, the 
definition of a prisoner’s dilemma game can be extended from a 2-by-2 game to an n- 
by-n game. 



Definition (Generalized Prisoner’s dilemma game) 

A generalized Prisoner’s dilemma game is a game where there is only one Nash 
equilibrium point, but the equilibrium point is Pareto dominated by other strategy 
combination of the game. 

For the generalized Prisoner’s dilemma game, however, only the guarantee of not 
playing defect strategy is not enough to find an optimal decision. The compensation 
action is necessary. An additional constraint could define a generalized standard 
prisoner’s dilemma game, i.e., the sum of the payoffs for any combination of 
strategies that involves the defect strategy should be less than the sum of the payoffs 
for some combination of strategies that involves no defect strategy. Then guarantee 
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communication alone can help to escape from the generalized standard prisoner’s 
dilemma. 



3.2 Choose or Create One Nash Equilibrium Point in an N-by-N Game 

The teehnique of ehoosing or ereating a Nash equilibrium point in 2-by-2 games ean 
be directly applied to n-by-n games. Each agent may suggest a certain eombination of 
strategies as a eandidate stable equilibrium. And in the negotiation proeess the 
candidates may be rejeet or aeeept by the agents. If no mutual agreement is reaehed, 
then offering the compensation to persuade each other is a way to reach a new 
compromise state. Each suggestion may accompany with the guarantee or 
compensation communieation aetions that ensure the suggestion is a Nash equilibrium 
point. Both guarantee and compensation are useful. We will show a seenario of 
negotiation process in an n-by-n game, where there is already a Nash equilibrium 
point, but with the help of guarantee and eompensation, agents ean form a new Nash 
equilibrium point and get higher payoffs. 



4. A Scenario of Negotiation of Changing the Nash Equilibrium 
in an N-by-N Game 

In this section, we show an example of using the suggestion, rejection, acceptance 
actions as well as the guarantee and compensation eommunication actions to help 
agents getting more payoffs in seeking Nash equilibrium. In this example, we assume 
that agents will suggest and accept only strategy combination that gives the highest 
payoff for itself And if there is a need to persuade other agent, agents will offer the 
minimum required eompensation. The game matrix in Fig. 4 (a), with one Nash 
equilibrium at strategy combination (Ql, P3) and the payoff for (Q, P) is (2, 3) 
respectively. Now we show how the negotiation can alter the game and reach a new 
Nash equilibrium point that gives better payoffs (3, 4). The negotiation proeedure 
proceeds as follows: (For the simplicity, the communications with the trusted third 
party for the guarantee and compensation actions are omitted). 

1 . Agent P suggests (Ql, P3) sinee it gives the highest payoff for agent P and it is 
a Nash equilibrium point. 

2. Agent Q rejects the suggestion since it does not give the highest payoff for 
agent Q. 

3. Agent Q suggests (Q2, PI) and asks guarantee 3.1 from P that P will not play 
P2 or P3. The suggestion (Q2, PI) gives the highest payoff for agent Q, but it 
is not a Nash equilibrium point. A guarantee 3.1 for P not to play P2, P3 will 
make it a Nash equilibrium point. 

4. But Agent P rejects the suggestion since it does not give the highest payoff for 
agent P. 

5. Agent P suggests (Ql, P3) again, sinee it gives the highest payoff for agent P 
and it is a Nash equilibrium point. 

6. Agent Q rejeets again. 
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Fig. 4. An n-by-n game matrix that models a two-agent deeision-making situation with 
one Nash equilibrium where (a) is the original game matrix, (b) is the game matrix after Q 
offering a eompensation of 3.1 for P to play PI, (e) is the game matrix after P aeeepting 
the suggestion and offering a compensation of 1.1 for Q to play Ql. 



7. Agent Q suggests (Q2, PI) sinee it gives the highest payoff for agent Q, and 
offers a compensation 3.1 for P to play PI to persuade agent P for leaving (Ql, 
P3) and playing (Q2, PI). 

8. Agent P accepts the compensation since the new suggestion gives higher 
payoff than he could get by its previous suggestion as indicated in Fig. 4 (b). 

9. Agent P finds that there is a better strategy combination (Ql, PI) after 
accepting compensation from agent Q. Therefore, agent P suggests (Ql, PI) 
and offers a compensation 1.1 for Q to play Ql since it gives the highest 
payoff for agent P and can persuade agent Q to leave (Q2, PI) and play (Ql, 
PI). 

10. Agent Q accepts the compensation and makes no more suggestion since the 
other strategy combinations will be dominated by the (Ql, PI) if agent Q tries 
to offer some compensation to persuade agent P as shown in Fig. 4 (c). 

11. Both agents reach a compromise that is a new Nash equilibrium point and get 
higher payoffs (3,4). 

The result after negotiation is: agent P pay compensation of 1.1 for Q to play Ql 
while agent Q pay compensation 3.1 for P to play PI. Agent P and agent Q will 
eventually end up with getting payoffs 3 and 4 respectively. At each step, the agent 
will suggest a strategy combination that gives a better payoff for itself Each 
suggestion must associate with communication actions that will lead to a Nash 
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equilibrium point. As shown in this example, at eaeh step, the suggestion will always 
lead to Nash equilibrium, but the payoffs change in different Nash equilibrium points. 
The final negotiated Nash equilibrium point gives more payoffs for each agent than 
the payoff of the original Nash equilibrium point. The agent always suggests the 
strategy combination that gives the highest payoff for itself (a rational agent). The 
agent accepts the other agent’s suggestion only when it gives more payoffs than the 
previous suggestion made by itself The suggestion always goes with either an 
guarantee or compensation action that will make the suggestion a Nash equilibrium 
point. 



5. Discussion 

5.1 The Need of Negotiation 

In a negotiation game, agents will try to reach an equilibrium point before they play 
the game. In different game situations, agents need to apply different negotiation 
mechanisms to achieve the goal of finding an equilibrium point. The benefit of 
negotiation to escape from prisoner’s dilemma is obvious. Here we discuss the other 
two difficult game situations with no or multiple Nash equilibria. 

In no or multiple Nash equilibria games, if the agents could not reach a 
compromise game situation, they can only play the mixed strategy defined by Nash in 
the game theory. Each agent has a vector of probabilities associated to every strategy 
that the agent may play. Nash has proofed the existence of the mixed strategy Nash 
equilibrium for each game. The vector can also help to calculate the expected payoff 
Since the expected payoff is a weighted sum of the payoffs in the possible outcomes, 
unless in a zero-sum or constant game, the summation of both agents’ expected 
payoffs must be less than the highest summation in the game. Therefore, there is a 
driving force for the agents to reach the state that gives the highest summation of 
payoffs for both, and to offer some compensation so that each may get more payoff 
than playing the mixed strategy. In a zero-sum or constant-sum game, the negotiation 
mechanism can also help the agent to reach a Nash equilibrium point in which each 
agent gets the same payoff as the agents play the mixed strategies. 

For example, consider the constant-sum game matrix in Fig. 1 (a). The expected 
payoff of the game (4, 2) can be deduced from the symmetric game matrix. The best- 
negotiated payoff is also (4, 2). It could be for the agent Q who suggests (Ql, PI) and 
offers compensation 1 and asking a guarantee 1.1 for agent P to play P3. Or it could 
be for the agent P who suggests (Ql, P3) and offers compensation 1 and asks a 
guarantee 1.1 for agent Q to play Ql. The two ways will lead to two different Nash 
equilibrium points, but the final payoffs for each agent remains the same. 

In another example, consider the matrix in Fig. 1 (b) where there are multiple Nash 
equilibrium points. Any mixed strategy can not help the agents get better payoffs than 
the negotiated payoffs (6, 5). The final state of the negotiation is that the agent Q 
suggests (Q2, PI) while offering compensation 2 and asking a guarantee 0.1 for agent 
P to play PI. These actions make (Q2, PI) a Nash equilibrium point and each agent 
will get a higher payoff than that of the original Nash equilibrium points. 
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5.2 The Suggest, Accept and Reject Actions 

The suggesting aetions, aeeepting aetions and rejeeting aetions ean be very versatile 
in aetual human negotiation taeties. We just give one way that is feasible for software 
agents. Agents make suggestions that belong to the Pareto efficient set and reject 
other suggestions. Agents accept only the suggestions that give the highest payoff 
after taking the compensation and guarantee into account. We can give some 
important result under the assumption of proper quantum principle. But the proper 
quantum principle is not always a fair assumption. If the agents have the right to offer 
different scale of compensation, the result will be quite different. There could be a 
need of new consideration and new communication actions. For example, taking the 
loss of not reaching a compromise into account, adding threat or stand still as new 
actions. 



5.3 The Underlying Assumptions 

To apply the negotiation game as a coordination tool, there are several assumptions 
that make the negotiation game valid. 

1 . The game matrix can be constructed. 

2. The agents are assumed to be mutual rationality. 

3. The game payoffs are transferable. 

4. The trusted third party exists and can take the communication actions. 

5. The results of the actions that agents act are observable by all the agents. 

The first two assumptions are adopted from the traditional game theory. In 
traditional game theory, the payoffs are not transferable. However, if we may 
postulate the payoffs are certain kind of currency, then the payoffs are transferable. In 
the future e-commerce environment, there should be some kind of currency as the 
basis of the payoff exchange. Authority outside the Internet, for example, the 
governments or incorporations, may establish the trusted third party. The results of 
the actions that agents act are observable by all the agents should be provided by all 
the agents in the community. 

In addition to the five assumptions above, it requires also constraints to help the 
mechanism work. First is the proper compensation quantum principle as we 
mentioned. We argued that the offering of the compensation must be fair. That is, an 
agent should not make an offer less than that of other agent. This principle prevents 
the situation that only one agent is making compromise while the other agent is 
staying no reaction. This principle is arguable. In reality, one agent may stand still if it 
doesn’t care about failing to reach a Nash equilibrium point or the agent has no 
obligation to play the game. Second is the notion of treating payoffs as the same if the 
difference is small enough. This can help agents to reach the equilibrium faster 
without calculating compensation in the asymptote way. 
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5.4 The Difficulty of Applying the Negotiation Game to N-Person Games 

The mechanism potentially can be extended to n-person games. So far we discuss 
only 2-person games with a trusted third party. We do not apply the mechanism to n- 
person games. The difficulty comes from two folds. First it is hard to construct the 
game matrix considering the complexity. The number of combination of strategies 
increases in an exponential way. For the game remain tractable, n must be a relative 
small number. Second, it must redefine the concept of compensation. In two person 
games, the compensation is offered by one agent and received by another. In n-person 
games, the compensation offered by one agent may be sent to one or many agents 
under different circumstances. The distribution of compensation can be divided into 
many ways. 



6. Conclusion 

The purpose of negotiation game is to ensure the binding of commitment so that 
the agents can get more payoffs. To achieve the purpose, the guarantee and 
compensation communication actions and the trusted third party must be added into 
the game. While the agents are rational and may change the decision to maximize the 
payoff at any time, the mechanism prevents the agents from deviating from the 
commitment and getting optimal payoffs. In the e-commerce environment, it is 
important to trust the agents in business. The mechanism provides a way to 
broadening the trust from a trusted third party to any agent on the Internet. The trusted 
third party is domain independent; it can mediate any kind of business. 

Extending the negotiation game into n-by-n can model the decision “not to play the 
game” as a strategy into the game. This force the agent must play the game. The 
“must play” condition forces the agent must negotiate for a better result. Since the 
agents are assumed to be rational, they will try to reach Nash equilibrium while 
seeking for a higher payoff. In this paper, we show that the negotiation game can be 
applied to n-by-n games. The agents can escape from the generalized prisoner’s 
dilemma game, and get more payoffs in several game situations. The payoff each 
agent can get in a negotiated equilibrium is, unless in a zero-sum or constant-sum 
game, more than that each agent may get in mixed strategy equilibrium. In a zero-sum 
or constant-sum game, the agents get the same payoffs as the mixed strategy. 
However, it is hard to apply the mechanism on n-person game. We leave it as the 
future work. 
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Abstract. In the paper, we discuss the requirements on security of agent 
communication and its implementation in multiagent system. Multiagent 
system must run on security encrypting channel and provide multilevel 
check mechanism in order to cope with illegal intruder in distributed 
application environment. We propose an encrypting channel based on 
RSA and Rabin algorithm, providing signature and encrypting service 
in low layer, and present an authority-control mechanism DSM in high 
layer. In DSM, we design a method of hybrid authority check that agent 
can visit a kind of default agent service by the right of its identity or 
by agent ID. A flexible security configuration based on the mechanism 
above is provided. 



1 Introduction 

Multiagent environment provides convenience for the development of distributed 
applications, especially for those on the Internet and for Electronic Commerce. 
Generally, this is established on public information network, it inevitably has 
security problem consequently. Here we discuss the security mechanism adopted 
by AOSDE (Agent Oriented Software Developing Environment) [6], and demon- 
strate it with an example on the Internet. 

In AOSDE, the definition of agent includes agent’s ID, the class, and the 
description of capability and security. We describe it using Agent Description 
Language (ADL), then the agent builder creates a legal agent accordingly and 
endows it a private key and public key, meanwhile, saves it to agent base. Agent 
sends service request to other agents, the requested agents check if applicant has 
right to use the service and give corresponding responses. 

The communication among agents is realized through the Internet with 
TCP/IP Protocol or other TCP/IP based protocols. It is normally to take the 
following factors into account: 

1. Communication wire has the danger of wiretapping. 

2. Requirement on ability to affirm user’s validity and to prevent all kinds of 
deceitful actions. 
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Now we put forward the concept of encrypting system. An encrypting sys- 
tem can be described as 5 = {P, C, P, P, P}, where P indicates plain text space, 
C is cipher text space, K is key space, E expresses encrypting algorithm, D is 
decrypting algorithm. Given key k < K ^ the algorithm of encrypting and de- 
crypting marks Now there is P = D^{C) = D^{E^{P)) which means to 

get C through encrypting p use E^ , then we get p through decrypting C by D^. 
In order to provide security service, the encrypting system must offer the follow- 
ing function services: 

1. Peer-entity authentication 

2. Data Confidentiality 

3. Data integrity 

4. Data origin authentication 

5. Non-repudiation 

The distributed security model should first consider the legal actions of agent 
and the security strategy of maintaining system. Its precondition is based on 
the channel security of providing signature. The distributed security mechanism 
must take the message exchange among agents into account, instead of consider- 
ing read and write that active agent manipulates to static object in traditional 
security model[l,2]. 

A malicious agent can intercept some cryptographs and analyze their con- 
tents, in the end it gets the cryptograph that includes important message. It can 
send the cryptograph time after time and make the system function turbulence. 
More seriously, we can consider an application example inside an enterprise: 
attacker A is a corporation that supplies goods for corporation B. After analyz- 
ing, A find the effect of cryptograph in B is to pay for A, then A can send this 
cryptograph constantly and make B to suffer loss. We can adopts the method of 
multiple verification and adding time stamp to make that the message between 
agents can’t repeat, so make this kind of attack invalidation. 

2 The Design of DSM in AOSDE 

In AOSDE, we proposed a solution called Distributed Security Mechanism 
(DSM). In this solution, the key issue is how to offer signaturable security chan- 
nel. RSA, developed by Rivest, Shamir and Adleman in 1978, was considered 
as the most excellent asymmetric style that is a kind of grouping cryptogram 
algorithm. It security is based on that greater prime number is more difficult 
to decomposed, and no effective decomposition algorithm in mathematics exists 
now. Several transforms, such as key exchanging algorithm DifRe- Heilman and 
Rabin, satisfied the requirement of security channels [3]. 

The so-called digital signature [4] is a security measure taken to prove the 
authenticity of both the received message and the sending source to third party. 
It can solve the rub caused by the sender’s dishonesty and also ensure that the 
sender may not deny and forge message. The process includes sender’s sending 
signature and receiver’s validating of signature. The data signature is different 




64 



Zhongzhi Shi et al. 



from traditional signature and it is constructed by encrypting algorithm where 
key and data participate in operation, so its characteristic is dynamic change 
according to key and data, meanwhile signature and data can’t intersected. When 
we use RAS algorithm, since private key is kept secretly and is unique, using 
it to encrypt data can’t be imitated by others, so it can serve as foundation of 
notarization and arbitrage having law significance. 

Second it needs to provide high services and each agent uses this service to 
log in, authority check, and modify draft. This work can be completed by a 
special authentication center (AC). The whole DSM is encapsulated as agent’s 
fundamental ability. The ability of AC in one agent is same with that of other 
agents. The only difference is that AC uses some managerial functions in DSM 
to finish assignment of login and key and submits various modifying requests 
according to their need. It has additional advantage that when previous AC 
’’retire” for some cause, it can select a trusty agent to relay. In this situation, 
we can send login file to successor, and then we communicate to other agents. 
Depicted in figure 1, the structure of DSM is divided into lower lever and high 
lever. Lower lever provides encrypting channel and high layer provides authority 
check. (See Fig. 1). 




Fig. 1. The structure of DSM inside Agent 



3 The Design of Bottom algorithm in DSM 

The disadvantage of traditional RSA algorithm is its slowness. In general, en- 
crypting or decrypting a data need log N multiplication. Using some optimal 
algorithm, such as symmetry of modular multiplication, it is found that im- 
provement isn’t great. Rabin algorithm is fast and secure. It is proved that its 
complexity is equal to that of factorization. Because of real time requirement of 
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multiagent system, we must adopt a fast and secure method, a hybrid system of 
RSA and Rabin. Rabin algorithm is used to encrypt data and RSA to carry out 
digital signature [5]. 

System Parameters For each user i, there are public parameters: ,6 and 

private parameters, such as pi^qi^di^bn^ 6 ^ 2 , where b is left element of 
module Ni : Ni = piX qi while pi and qi are different great odd primes; bn G 
QRA{pi), bi2 G QRA{qi)^ (j)i = {pi — 1) X [qi — 1) m d, Euler fuction of Ni. 
Then x = 1 mod 7V^, while and di are public key and encrypting key, 
respectively. 

Encrypting Algorithm The encrypting algorithm of user i is Eni^I){X) = Xx 
{X + b) mod Ni. We easily know it needs modulus Ni addition, modulus Ni 
multiplication, and modulus Ni division. 

Decryption Algorithm The decrypting algorithm is simple as a result of Ni = 
Pi X qi .In fact, given ciphertext m, we can use the algorithm of Adleman, 
Manders and Miller to compute equation X x (X + 6 ) = m mod pi and 
X X (X b) = m mod qi which suppose r and <s are answers respectively, 
then compute k and I which satisfy kxpi~\-lxqi = l using Euclid algo- 
rithm(because pi^qi is different prime, gcd{pi^qi) = 1 , so the former formula 
has answer). It is easily proved that IxqiXr^kxpixsis the answer of equa- 
tion XX {x-\-b) =m mod N . Eor prime pi^ equation x x {x + b) = m mod pi 
has an unique answer. 

Lemma Suppose p is a prime, a is square of p. If 6 G QRA{p) then exists 
an algorithm to compute square root of module p of a, and it carries out 
Oilog^p) times operation. 

Proof: it computes according to following steps 

1) If p is prime, 2 | (p — 1 ), let p — 1 = 2 cP, and p is odd. 

2) Inductive ascertain ki and ai. Erom the definition of ki and we can 
see: 



"p _ 



= {ai-ib^ 






^Pi2 



-1 



b^ ^(modp) (1) 



Eollowing prove ki is degressive. Because b G QRA{p)^ so from Euler 
discriminance we can get: 

- 1 = - = ( 2 ) 

P 

Erom (1), (2) and the definition of ki^ 

= (-1)(-1) = l(modp) (3) 



It declares ki < so we can find kn = 0. Erom the definition of k, 

- .ib+i) 



Vn ^(modp) 

r‘1 = a^(modp), 1 < i < n — 1 
= a„(modp) 



( 4 ) 

( 5 ) 

( 6 ) 
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3) From this definition, which can be proved by induction, ri is the square 
root of module p and the other answer is p — r\. Following analysis 
explains that the equation x * (x + 6) = modp only has one answer and 
its solution. Since p is a prime, exists k that satisfies 2k = l{modp). It is 
distinct that = x‘^ ^2kxb^ k‘^b‘^ = {x ^ kb)‘^ {mod p) According 

to the algorithm of lemma, we can get the square root ri = x kb 
and V 2 = p — {x kb). When replacing the encrypting equation with 
ri — kb and V 2 — kb, we can get: 

(ri — kb)[{ri — kb) b] = {x kb — kb)[{x kb — kb) + b] 

= x{x + b) 

= m(modp) (7) 

{v 2 — kb)[{v 2 — kb) + 6] = (p — (x + kb) — kb)[{p — {x kb) — kb) + b] 

= {p — X — 2kb){p — X — 2kb + b) 

= {p — x){p — X — b) 

= (p — — pb xb 

= x^ — pb d- xb 

= m — pb ^ m(modp) (8) 

From this we can draw a conclusion that ri — kb is the answer of an 
equation x ^ {x b) = m mod p and V 2 — kb isn’t. 

Signature algorithm It adopts standard RSA signature algorithm that uses 
formula P = x^{modN) to transform the part content of plaintext. 

Signature Verification Algorithm It uses formula x = P^{modN) to trans- 
form signature content and compares result with plaintext 

4 The Security Analysis and Implementation of 
Algorithm 

Since RSA and Rabin algorithm have already been discussed in many books, we 

can get the following results: 

1. If attacker can decompose TV, then can utilize information of public file to 
get all things. Signature key can get by Euclid algorithm. To settle RSA 
and Rabin algorithm will not be more difficult than factorization because of 
solving a non- square remnant of a prime for probable polynomial. 

2. If we can get 0(n), then it is easy to calculate p and q from n and (f){n). So 
it has equal complexity to figure ^(n) and n. 

3. If they can get d which satisfies ed = 1 mod 0(n), then attacker can forge 
signature, but can’t get ciphertext. By utilizing d and e, probability poly- 
nomial algorithm can decompose n. Each cycle parameter a is randomly 
selected and satisfy gcd{a,n) = 1, then the probability of decomposing n is 
1/2 in the cycle. 
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4. For RSA, we don’t know if decrypting strictly equal to factorization. But for 
Rabin system, Suppose n = pq and p, q is different odd prime. If polynomial 
algorithm A can solve a result of equation x‘^ = l{modn) for c in QR{n)^ then 
probability of algorithm B that can solve factor of n exists. The probability 
that algorithm B decomposes n at one time greater than 1/2. For A selected 
randomly k times, gcd{a^n) = 1, the probability of factor of n is less than 
1/2^ after algorithm B run k times. 

In low layer DSM provides reliable end to end communication mechanism 
with RSA-Rabin algorithm. It is encapsulated in basic capability of agent and 
is transparent for agent its developer. It only needs target agent’s ID and con- 
tent when communication happens, while system finish remnant tasks (including 
acquiring public file of target and utilizing public file to produce ciphertext and 
digital signature, and so on). Because it is used frequently, we put it into a JAVA 
class SecurityBox, which is one of accessible bottom modules when agent kernel 
is running. It provides following standard interfaces which shows basic ability 
of security channel, where privateKey is private key of agent, and publicKey is 
public key. 

String MakeSignature(int privateKey, String source); use private key to 
do digital signature for plaintext source and return signature. 

String EncryMessage( SecureFile publicFile, String source); use public 
security file of target Agent to encrypt plaintext source and return ciphertext. 
String DecryMessage( SecureFile privateFile, String secretMsg) ; deer 
ypting ciphertext secret Message, return plaintext. 
BooleanCheckSignature(int publicKey, String signature, String source); 
check if the signature that agent owned publicKey sign plaintext source is 
equal to signature. 

The agent adopts above interfaces to encrypt, organize data, and exchange 
message. The request ability of agent is described as follows: 

ability : request (PID target Agent , String message) { 
connection=OpenConnect ion (target Agent) ; 

//acquire network address of target agent through LC, 

//and set up a connection. 

String signature=MakeSignature (myPrivateKey, message) ; 
secretMessage=EncryMessage (targetPubKey , signedMessage) ; 

// send to network 

sendCconnection, secretMessage , signature); 

} 



After service providing agent received request, it calls interface of according 
SecurityBox to resume plain text from cipher text and to finish part of check 
work. The response ability of agent is as follow: 
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ability : response (PID sourceAgent, 

String SecretMessage , 

String Signature) { 

String message=DecryMessage (privateFile , secretMessage) ; 

IF CheckSignature (sourcePubKey, signature , message) ==TRUE 
THEN PROCESS (message) ; 

ELSE DISCARD (message) ; 

// Here the process is a reverse process of REQUEST. 

//It breaks up data package. SourcePubKey is public 
// key of application Agent and when necessary, it 
// can be got by AC. When judgement of IF statement is 
// TRUE, then call PROCESS () to process the application 
// and to check high lever check which is discussed 
// in following APL mechanism, else call DISCARD } 

Receiver must verify that sender has right to request its services and to 
confirm whether it is a valid message. This check routine encapsulated is in 
basic ability of agent and realized in high layer of DSM. 

5 Design of DSM in High Layer 

Access Permission List (APL), which is a structured table, is used in high layer 
of DSM. It contains the information on access limitations of one agent’s abilities 
to other agents. APL must contain following information: ID of APL owner, 
access limitations of agent abilities, i.e., a list of agent that has right to request 
the ability of the agent, and Proxy Permission Certification (PPG) to support 
indirect access. Preservation and maintain of APL is an important aspect of 
system design. The choice on control strategy, concentrated or distributed con- 
trol, will effect system performance and security and complexity of maintain. We 
adopt trade-off methods that AC manages all APL of agents, while the use and 
maintenance of APL is handled when agent is running. 

When logging, an agent has got its private APL. Access authority check 
is executed locally through APL. The maintenance and updating is also dealt 
locally. In this case, during the running time of the agent, APL can be ’’hot” 
updated. For example, AC sends requests to related agents to add new abilities 
of the new agent which will be added into the APL. Another example is that 
an agent should add or delete PPG from APL by indirect access which will be 
discussed more detail later. Agent will return APL to AC in order not to be 
unlawfully updated on the local machine when agent exits. 

There are two kinds of abilities for the client agent to access the server, direct 
mode and indirect mode. With the direct mode, the server will examine whether 
the client agent is contained in the APL. But the indirect mode is a little difficult 
because the client agent have no right to share the service. If it’s necessary to 
use this service, we can replace the authority agent with the client agent as a 
temporary executor when adopting the proxy. Meanwhile, the authority agent 
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will sign up a PPG with a serial number to the proxy agent and the server agent 
respectively before the indirect access. This server agent will add the PPG to 
the APL to check the validity of the request from the proxy agent. It’s also 
important that the server agent need to delete the PPG from the APL at the 
end of this process to make sure that there is only one time to use this PPG. 
For example: 

In Intranet application, agent A represents a senior member Joe and agent 
P represents a print server as well as agent D represents a database server of 
a company. In this application, A is a trusted agent. P is developed by third 
party, named an distrusted agent. Unlike P, A has authorized to ask D for data. 
Now considering a task that A wants to print monthly sale reports, all three 
agents like A, P and D should cooperate with each other because P can’t get 
data directly. When A ask P for printing, P will access the database server D 
indirectly. The procedure is as follows: 

Step 1 Agent A will sign up a PPG with a serial number and show some detail 
parameters like authority service. 

Step 2 Agent A sends the PPG to agent D. 

Step 3 Agent A sends the PPG to agent P and ask for P’s service. 

Step 4 Agent D will add the PPG to APL and save it to check up later. 

Step 5 After receive the request and the attached PPG from agent A, agent P 
will ask agent D for data request service and send the PPG to agent D at 
the same time. 

Step 6 When agent D receives the service request from agent P, it will check 
the PPG. If the PPG is the qualified one, agent D will send the data to agent 
P. At the end of the service, agent D will delete the PPG from the APL. 
(Figure 2) 




Fig. 2. indirect access fiow chart 



One aim of the APL is to lighten the burden of the developers and the users. 
According to the APL standard interface, the developers only need to complete 
special APL code and design their APL format for their own special requirement. 
The reason that there isn’t any prescriptive APL format is that it is impossible 
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for one format to unify the endless service access control modes. It’s unskillful. 
So we define the basic APL interface and check the authority of the PPG by 
the agent kernel using the interface. Because AC won’t be responsibility with 
the maintenance of the APL and won’t understand the format of the APL, the 
needed operation between AC and the APL, when using one file, is to save this 
file for every agent and send it or receive it at the proper time. Of course, for most 
conditions, simple format implemented by a modal APL base and the default 
interface of the agent kernel will be satisfied with generic applications. Here is 
an example: 

(Owner : DataManager 

(ability : read 
directuser : (userl ,user2) 

ProxyUser : ( userS ,user4) ) 

(ablity : write 
directUser : (Chair , Counter) 

ProxyUser : (Printer))) 

We use list structures which is LISP like to describe this example. Keywords 
Owner, ablity, directUser, ProxyUser represent owner, ability, direct user and 
proxy user respectively. 

The high-level standard interface is as following: 

— boolean CheckPID(PID pID, Service s, Param p); Examine whether agent 
pID has qualification to use service s which has service content p, while pID 
is a symbol of the agent, s is the name of the service and p is the service 
content. If pID has such qualification, then the function will return TRUE, 
otherwise return EALSE. 

— boolean DeletePID(PID pID, Service s); Delete the access right of agent pID 
to service s. 

— boolean UpdateServiceRange(PID pID, Service s, Param p); Modify the 
range of agent pID to service s. 

— boolean InsertPID(PID pID, Service s, Param p); Add a new agent with 
service s. 

— boolean InsertProxy(PID signer, PID proxy,Service s, Param p); Add a proxy 
with the access of service s. If the signer has the access to the service, that 
is the service is used by the signer directly, the function will return TRUE 
and add the proxy into APL, otherwise return EALSE. 

— boolean DeleteProxy(PID proxy,Service s, Param p); Delete the access of the 
proxy to service s from APL. If successfully, then return TRUE. 

— boolean CheckProxy(PID proxy,Service s, Param p); Examine whether the 
proxy has qualification to access the service s indirectly. If possible, return 
TRUE. In this function, we will check the PPG. 

— SendAPL(); When the agent stops, send APL to AC for saving. 

— RestructAPL(); When initialize the agent, get the last APL from the AC. 

There is only one AC running on the special host in the system which main- 
tains the Agent Security Database (ASD). Every legal host has an item in AIDE 
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that records its flag, password, public key, private key and other APLs. When 
the agent is running, it will register on the register center to submit its own ID 
and password. Then pack this content using its private key and AC’s public key. 
After AC checks its ID, put it into running agent table as a new item. At last 
send this agent’s APL to this agent. An example is shown in Figure 3, where A 
is a new agent who want to be added into the system. 




Fig. 3. Agent log in 



6 Conclusions 

With the help of highly developed network infrastructure and the well-grown dis- 
tributed computing technology, electronic commerce has been adopted in many 
application fields as the most suitable solution. Although some standards such 
as SSL have already been made, other security communication standards, such 
as the signature and the encryption mechnaism made by the industry leading 
companies, such as, Netscape, Sun and Microsoft, are still not available. It’s also 
hard to find the public security interface to support the application except for 
that of JAVA by SUN. So it is important to constitute a security communica- 
tion standard between agents. In this paper, we discuss a security interface in 
multiagent environment AOSDE. We hope we can find a new way to implement 
this security interface. 
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Abstract. Mobile agents[l] are active entities, which may migrate to meet other 
agents and access the place's services. For agent collaboration in distributed 
environment, mobile agents should be able to exchange messages with each 
other, even if they are moving across a network. The ability to send messages to 
a moving agent is important mechanism for agent collaboration. In this paper, 
we propose CORE A[2] -based remote messaging mechanism with a binding ta- 
ble. This mechanism allows messages in flight when a mobile agent moves, and 
messages sent based on an out-of-date table binding, to be forwarded directly to 
the mobile agent’s new location. In this mechanism, an agent can transfer mes- 
sages directly to the target agent based on the binding table. 



1 Introduction 

One of the most eompelling visions of the future is a world in whieh speeialized 
agents collaborates with each other to achieve a goal that an individual agent cannot 
achieve on its own. These agents would need to move, when appropriate, to conduct 
high-bandwidth conversations that could not possibly take place over a low- 
bandwidth network. A platform for building these systems, therefore, would have to 
support a powerful messaging system that allows mobile agents to communicate 
seamlessly with each other, even if they are moving across a network. 

When an agent wants to communicate with another agent, it must be able to find 
the target agent to transmit messages. The ability to transfer messages to a particular 
agent is important for agent messaging capabilities especially in the case of moving 
agents. The mechanism delivering messages in flight when a mobile agent moves to a 
target agent should be provided for collaboration of stationary and mobile agents. 

Today distributed object technology/middleware, such as OMG's Common Object 
Request Broker Architecture (CORBA), has gained considerable acceptance in the 
distributed computing environment. In addition, mobile agent technology is currently 
gaining momentum in the distributed computing environment, too. The recent OMG 
work on a Mobile Agent System Interoperability Facility (MASIF)[1] specification 
can be regarded as a milestone on the road toward a unified distributed mobile object 
middleware, which enables technology and location transparent interactions between 
static and mobile objects. 

In this paper, we propose CORBA-based remote messaging mechanism with the 
binding table in distributed agent environment. This mechanism allows messages in 
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flight when a mobile agent moves, and messages sent based on an out-of-date table 
binding, to be forwarded directly to the mobile agent’s new location. With this 
mechanism, an agent can transfer messages directly to the destination mobile agent 
based on the binding table. 

The remainder of this paper will proceed as follows. In Section 2, we describe vari- 
ous mobile agent systems and integration of mobile agent technology and CORE A. 
Naming systems in distributed agent environment is discussed also. In section 3, 
CORBA-based direct remote messaging mechanism with the binding table is pre- 
sented. Notification messages used in remote messaging mechanism is explained. In 
Section 4, we show the performance analysis of proposed remote messaging system. 
Finally, in section 5, we discuss current status of this work and the future directions 
that this work may take. 



2 Backgrounds 

In this section, we describe current status of mobile agent systems and its remote 
messaging mechanisms. We also study about the benefits of integrating mobile agent 
technology and CORBA. MASIF solutions of CORBA Naming Service[3] problems 
in distributed agent environment will be presented subsequently. 



2.1 Current Mobile Agent Systems 

The Ara[4] system is a mobile agent platform under development at the University of 
Kaiserslautern. In spite of the emphasis on local interaction, a simple asynchronous 
remote messaging facility between agents is added for pragmatic reasons, appropriate 
e.g. for simple status report, error messages, or acknowledgements which do not re- 
ward the overhead of sending an agent. However to avoid remote coupling, the mes- 
saging facility will not involve itself in any guarantees against message losses. A 
message will be delivered to all agents at the indicated place whose names are subor- 
dinates of the indicated recipient name in the sense of the hierarchical agent name 
space. This addressing scheme may be used to send place-wide multicast messages or 
implement application-level transparent message forwarding by installing a subordi- 
nate proxy agent[4]. 

Aglets[5] are Java[6] objects that can move from one host on the network to an- 
other. Aglets supports remote message passing, and aglet objects can communicate by 
messages remotely as well as locally. Remote message passing can be used as a 
lightweight way of communicating between aglets that reside on the different hosts, 
and can reduce the network traffic, the cost of defining classes, and security issues. 
When an aglet is created, it is automatically associated with a proxy object that is 
returned to the application. The application should then use this proxy to control the 
aglet. The dispatch method will return a new proxy that give control of the remote 
aglet. The proxy returned by the dispatch method seems like any other proxy for a 
local aglet, but in fact it is what we call a remote proxy. It allows the application to 
control the aglet through the proxy as if the aglet were local. As a consequence of this 
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architecture, an aglet can have no more than one local proxy but multiple remote 
proxies[7]. 

ObjeetSpaee Voyager Core Technology (Voyager) [8] is an object request broker 
(ORB) [2] for ereating distributed Java applieations. Voyager contains a superset of 
features found in other ORBs and agent platforms, ineluding CORBA, RMI[9], Gen- 
eral Magic’s Odyssey[10], IBM’s Aglets and Mitsubishi’s Coneordia[l 1]. Voyager can 
integrate fundamental distributed computing with agent teehnology. 

Voyager enables objects and other agents to send standard Java messages to an 
agent even as the agent is moving. A remote object is an object that can exist outside 
the loeal address space of an applieation. An application can communicate with a 
remote object by constructing a virtual version of the remote objeet locally. This vir- 
tual version is called a virtual object and acts as a reference to the remote object. 
When messages are sent to a virtual object, the virtual objeet forwards the messages 
to the remote object. If an object moves from one application to another, you ean still 
locate the object by using its last known address. When an object moves, it leaves 
behind a speeial kind of object called a seeretary to forward all messages sent to the 
object’s old location. If a message arrives at an applieation and eannot loeate its target 
remote object, it searches for a secretary. If the message locates a secretary repre- 
senting the object, it uses the secretary to forward itself[12]. 

2.2 Mobile Agent and CORBA 

The CORBA has been established as an important standard, enhancing the original 
Remote Procedure Call (RPC) based arehitectures by allowing relatively free and 
transparent distribution of service funetionality[13]. Besides mobile agent technology 
has been proved to be suitable for the improvement of today’s distributed systems. 
Due to its benefits, such as dynamic, on-demand provision and distribution of serv- 
iees, reduction of network traffie and the reduetion of dependence regarding server 
failures, various problems and inefficieneies of today’s elient/server architeetures can 
be handled by means of this new paradigm[14]. 




However, for several applications RPCs still represent a powerful and efficient so- 
lution. Thus, an integrated approach is desirable, combining the benefits of both cli- 
ent/server and mobile agent technology, and on the other hand eliminating or at least 
minimizing the problems that rise if one of these teehniques is used as ’’stand-alone” 
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solution. Fig. 1 shows this integrated approach by means of the distributed agent 
environment that is built on top of CORE A. 

Mobile agent technology is driven by a variety of different approaches regarding 
implementation languages, protocols, platform architectures and functionality. In 
order to achieve a sufficient integration with COREA, a standard is required for mo- 
bile agent technology. This standard has to handle interoperability between different 
agent platforms, and the usability of (already existing) COREA services by agent- 
based components. 




Fig. 2. The MASIF Architecture 

The purpose of the MASIF[16, 1] is to achieve a certain degree of interoperability 
between mobile agent platforms of different manufactures. As shown in Fig. 2, MA- 
SIF has adopted the concepts of places and agent systems that are used by various 
existing agent platforms. A place groups the functionality within an agent system, 
encapsulating certain capabilities and restrictions for hosted agents. A region[l] fa- 
cilitates the platform management by specifying sets of agent systems. Two interfaces 
are specified by the MASIF standard: the MAFAgentSystem[I] interface provides 
operations for the management and transfer of agents, whereas the MAFFinder func- 
tions as an interface of a dynamic name and location database of agents, places, and 
agent systems. 



2.3 Naming Services in Distributed Agent Environment 

The COREA Services are designed for static objects. When COREA naming services 
are applied to mobile agents, they may not handle all cases as well. 

Stationary agents as well as mobile agents may publish themselves, since a 
COREA object reference (IOR)[2] comprises, among others, the name of the host on 
which an object resides and the corresponding port number, a mobile agent gets a new 
lOR after each migration. In this case, the lOR that is kept by the accessing applica- 
tion becomes invalid. Following three solutions for this problem are specified in MA- 
SIF. 
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1. The first solution is that the ORB itself is responsible for keeping the lOR of 
moving objects constant. The mapping of the original lOR to the actual lOR of 
the migrated agent is managed by a corresponding proxy object, which is main- 
tained by the ORB. Although this capability is described by CORBA, it is not a 
mandatory feature of an ORB. Thus, the MASIF standard does not rely on this 
feature[l]. 

2. The second solution is to update the name binding associated to the mobile agent 
after each migration, i.e. to supply the Naming Service with the actual agent 
lOR. This can be done by the agent systems, which are involved in the migration 
process or by the migrating agent itself. In this way, the naming service main- 
tains the actual lOR during the whole lifetime of the agent. If an application tries 
to access the agent after the agent has changed its location, the application re- 
trieves an exception (e.g. invalid object reference). In this case the application 
contacts the Naming Service in order to get the new agent lOR. A disadvantage 
of this solution is that the MAFFinder must be contacted by the migrating agent 
after each migration step in order to retrieve the new lOR to which each mes- 
sage must be sent[l]. 

3. When a mobile agent migrates for the first time, the original instance remains at 
the home agent system and forwards each additional access to the migrated in- 
stance at the new location. In this way, the original lOR remains valid, and the 
clients accessing the agent need not care about tracking it. They still interact 
with the original instance, called proxy agent, which only exists to forward re- 
quests to the actual (migrating) agent. One disadvantage of this solution is that 
the proxy agent must be contacted by the migrating agent after each migration 
step in order to retrieve the new lOR to which each access request must be for- 
warded. Another disadvantage is that the home agent system must be accessible 
at any time. If the home agent system is terminated, the agent cannot be accessed 
anymore, since the actual lOR is only maintained by the proxy agent[l]. 



3 Direct Remote Messaging Mechanism Based on CORBA 

In this section, we propose CORBA-based remote messaging mechanism with the 
binding table to allow better messaging, so that messages can be delivered from a 
source agent to a destination mobile agent without contacting the MAFFinder or go- 
ing to the home MAFAgentSystem system first. Proposed mechanism can be applied 
over MASIF solutions, resulting in efficient MASIF compliant remote messaging 
mechanism. Proposed messaging mechanism supports the second and third solution of 
MASIF, MAFFinder and home agent based concepts, because MASIF doesn’t rely on 
the first solution, keeping lOR to the actual agent lOR by ORB. 



3.1 Notification Messages 

Notification messages are defined to provide a flexible mechanism for MAF Agent- 
Systems to update their binding table entries, associating the mobile agent’s unique 
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name with its new lOR. It is the mechanism by which mobile agent request forward- 
ing services after migration, keeping MAFAgentSystems’ mobility bindings be up-to- 
date information. Binding Update, Binding Warning, Binding Request, and Binding 
Acknowledge Messages are introduced. To reduce network overhead, notification 
messages should be light weight. 

• Binding Warning Message: A MAF Agent System will receive a Binding Warn- 
ing Message if a MAFAgentSystem maintaining a binding table entry for one of 
the MAFAgentSystem’ s mobile agent uses an out-of-date entry. When a 
MAFAgentSystem receives a Binding Warning Message, it should contact 
MAFFinder to find out the destination mobile agent’s current lOR. A Binding 
Warning Message can be used to advise an home MAFAgentSystem that an- 
other agent appears to have either no binding table entry or an out-of-date bind- 
ing table entry for some mobile agents. 

• Binding Request Message: A Binding Request Message is for requesting a mo- 
bile agent’s current mobility binding from the mobile agent’s home MAFAgent- 
System. When the home MAFAgentSystem receives a Binding Request Mes- 
sage, it consults its binding table and determines the correct location information 
to be sent to the requesting agent. 

• Binding Update Message: The Binding Update Message is used for notification 
of a mobile agent’s current mobility binding. It can be sent by the home 
MAFAgentSystem in response to a Binding Request Message or a Binding 
Warning Message. It should also be sent by a mobile agent, or by the foreign 
agent system with which the mobile agent is registering, when notifying the mo- 
bile agent’s previous foreign MAFAgentSystem that the mobile agent has 
moved. 

• Binding Acknowledge Message: A Binding Acknowledge Message is used to 
acknowledge receipt of a Binding Update Message. 



3.2 Direct Remote Messaging with the Binding Table 

MASIF naming solutions allows any mobile agent to move about, changing its point 
of attachment to the MAFAgentSystems, while continuing to be identified through its 
MAFFinder or home MAFAgentSystem. An agent sending messages to a mobile 
agent using unique name the same way as with any other destination. 

This scheme allows transparent interoperation between mobile agents and their cor- 
respondent mobile agents, but the MAFFinder or home MAFAgentSystem must be 
contacted by the migrating agent after each migration step in order to retrieve the new 
lOR. MASIF’s third solution forces all messages for a mobile agent to be routed 
through its home MAFAgentSystem. This indirect routing delays the delivery of the 
message to mobile agents, and places an unnecessary burden on the networks and 
home MAFAgentSystem along their paths through the Internet. 

Proposed remote messaging mechanism provides a means for a MAFAgentSystem 
to record the lOR of a mobile agent in the binding table and to then transfer their own 
messages directly to the current location indicated in that binding, without contacting 
MAFFinder or bypassing the home MAFAgentSystem. Fig. 3 shows the architecture 
of MAFAgentSystem in proposed messaging mechanism. 
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The MAP AgentSy stem maintains a binding table to keep traek of reeently assoei- 
ated mobile agents. If the mobile agent use the unique name of a correspondent mo- 
bile agent, MAP Agent System change the destination of sending message to the cur- 
rent lOR of a correspondent mobile agent. Mechanisms are also provided to allow 
messages in flight when a mobile agent moves, and messages sent based on an out-of- 
date table binding, to be forwarded directly to the mobile agent's new location. 



MAFAgentSystem 




Fig. 3. The Architecture of MAFAgentSystem 

Proposed mechanism provides a means for any agent to maintain a binding table 
containing the current location of one or more mobile agents. When sending a mes- 
sage to a mobile agent, if the sender has a binding table entry for the destination mo- 
bile agent, it may transfer the message directly to the current location in the recorded 
mobility binding. Pig. 4 shows the direct remote messaging mechanism with the 
binding table. 




Fig. 4. Direct Remote Messaging with the Binding Table 

In the absence of any binding table entry, a MAP Agent System can contact MAP- 
Pinder to find out a current lOR of the destination mobile agent, or messages destined 
for a mobile agent will be routed to the home MAP AgentSy stem in the same way as 
defined in MASIP, and then transferred to the mobile agent's current location. These 
are the only messaging mechanisms supported by the MASIP. With proposed mecha- 
nism, as a side effect of these remote messaging to a mobile agent, the original sender 
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of the message may be informed of the mobile agent’s eurrent lOR, giving the sender 
an opportunity to record the current lOR. 

Any MAP AgentSy stem may maintain a binding table to optimize its own remote 
messaging with mobile agents. An agent may create or update a binding table entry 
for a mobile agent only when it has received the mobile agent’s mobility binding. In 
addition, a MAP Agent System may use any reasonable strategy for managing the 
space within the binding table. When a new entry needs to be added to the binding 
table, the agent may choose to drop any entry already in the table, if needed, to make 
space for the new entry. Por example, a least-recently used (LRU) strategy for table 
entry replacement is likely to work well. 




Fig. 5. Contact MAPPinder to receive current lOR 

In the case of the MASIP’s second solution, when any MAP AgentSy stem receives 
a transferred message, if it has a binding table entry for the target mobile agent, the 
MAP Agent System receiving this transferred message may deduce that the source 
MAP Agent System has an out-of-date binding table entry for this mobile agent. 

In this case, as shown in Pig. 5, the receiving MAP AgentSy stem should send a 
Binding Warning Message to the source MAP AgentSy stem, advising it to contact the 
MAPPinder to find out destination mobile agent’s current lOR. No acknowledgment 
of this Binding Warning Message is needed, since additional future messages for the 
destination mobile agent transferred by the same MAP AgentSy stem will cause the 
transmission of another Binding Warning Message. 
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Fig. 6. Receiving lOR from home MAFAgentSystem 

Similar approach is designed for the case of MASIF’s third solution. When a 
MAFAgentSystem reeeives a transferred message, if it has a binding table entry for 
the target mobile agent, the MAFAgentSystem reeeiving this tranferred message may 
deduce that the sender agent system has an illegal binding table entry for the target 
mobile agent. 

In this case, as shown in Fig. 6, the receiving MAFAgentSystem should send a 
Binding Warning Message to the home MAFAgentSystem, advising it to send a 
Binding Update Message to the MAFAgentSystem that transmitted this message. As 
in the case of a Binding Update Message sent by the home MAFAgentSystem, Bind- 
ing Acknowledge Message should be sent to the home MAFAgentSystem. 

Proposed meehanism provides a means for the mobile agent’s previous foreign 
MAFAgentSystem to be reliably notified of the mobile agent’s new loeation, allowing 
messages in flight to the mobile agent’s previous foreign MAFAgentSystem to be 
forwarded to its new location. 

This notification allows any messages transferred to the mobile agent’s previous 
foreign MAFAgentSystem, from correspondent mobile agents with out-of-date bind- 
ing table entries, to be forwarded to its new location. 




Fig. 7. Message Forwarding Mechanism 
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The mobile agent need to transfer a Binding Update Message to its previous foreign 
MAP Agent System until the matching Binding Acknowledge Message is received. As 
shown in Fig. 7, previous foreign MAP AgentSy stem can forward messages to the 
destination MAP AgentSy stem using the information from the Binding Update Mes- 
sage. 



4 Performance Analysis 

Integration of CORBA and mobile agent technology is desirable, combining the bene- 
fits of both client/server and mobile agent technology, and on the other hand elimi- 
nating or at least minimizing the problems that rise if one of these techniques is used 
as "stand-alone” solution. 

Proposed remote messaging mechanism provides a means for a MAP AgentSy stem 
to record the lOR of a mobile agent in the binding table and to then transfer their own 
messages directly to the current location indicated in that binding, without contacting 
MAFFinder or bypassing the home MAP Agent System. 

Table 1 shows the performance analysis of other mobile agent platforms and pro- 
posed remote messaging systems 

Ara include simple remote messaging capabilities. But its remote messaging system 
does not allow messages in flight when agent moves. Aglet does not care about mes- 
sage losses because of incorrect location information. Ara only support simple control 
messages to be transferred remotely. Proposed direct remote messaging mechanism 
based on CORBA allows an agent to send messages to other agents, even if they are 
moving and regardless of where they are in the network. 

Aglet and Voyager support location transparent messaging system based on proxy 
agent concept. If the number of mobile agent increases, proxy agent also increases. 
Maintaing lots proxy agents is inefficient for the agent management, but proposed 
messaging mechanism use the binding table to provide location transparency in dis- 
tributed agent environment. 



Table 1. Performance Analysis with Mobile Agent Platforms 





Ara[4] 


Aglet[5] 


Voyager[8] 


Proposed 

Mechanism 


Remote Messaging Efficiency 








0 


Moving Agent Support 


X 




0 


0 


MASIF compliant 


X 


X 


X 


0 


Language Independence 




X 


X 


0 


Scalability 


X 


X 




0 



X: not supported, : weakly supported, O: well supported 



Proposed mechanism is MASIF compliant. Although Voyager is CORBA enabled, 
it does not conform to MASIF. 

Ara is dependent on TCL, Aglet and Voyager are implemented in Java. Although 
Voyager is based on CORBA, it is Java dependent. Proposed remote messaging 
mechanism conforms to MASIF, so language independence is acquired. 
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Both Ara and Aglet include simple, local lookup mechanism that enables associat- 
ing a string with an agent URL. Voyager contains its own directory servcie. These 
platforms doesn’t support MASIF naming servcie. Proposed CORBA-based remote 
messaging mechnism use CORBA Naming Service and MAFFinder to provide nam- 
ing service in large distributed agent environment. 



Table 2. Performance Analysis with MASIF 





MASIF 


Proposed Mechanism 


Message Transfer 


• Through home agent system 

• Contact MAFFinder 


Direct 


Flying Message 


No specification 


Forwarding 



Table 2 shows performance enhancements of proposed mechanism comparing with 
MASIF submissions. If the mobility binding is correct, only one message transfer is 
required. If binding is not correct, transferred messages will be forwarded to the des- 
tination mobile agent. 

When a mobile agent moves and registers with a new foreign MAF Agent System, 
the MASIF solutions does not notify the mobile agent system’s previous foreign 
MAF Agent System. Messages in flight that had been transferred to the old location 
when the mobile agent moved are likely to be lost and are assumed to be retransferred 
if needed. To avoid risks of message losses, proposed messaging mechanism provides 
a means for the mobile agent’s previous foreign agent system to be reliably notified of 
the mobile agent’s new mobility binding, allowing messages in flight to the mobile 
agent’s previous foreign MAF Agent System to be forwarded to its new location. 

A drawback of proposed scheme is overhead of notification messages. If mobile 
agents move a lot, notification messages will be flooded. But in normal case, pro- 
posed message system will provide enhanced performance. 



5 Conclusion and Future Works 

We proposed CORBA-based remote messaging mechanism with the binding table, 
which provides a means for any MAF Agent System to maintain a binding table con- 
taining the current lOR of one or more mobile agents. When sending a message to a 
mobile agent, if the sender has a binding table entry for the destination mobile agent, 
it may transfer the message directly to the current location in the recorded mobility 
binding. This remote messaging mechanism allows a mobile agent to send messages 
to other mobile agents, even if they are moving and regardless of where they are in 
the network. If the binding table doesn’t have a target agent entry, the MASIF mes- 
saging mechanism can be used instead. 

Proposed CORBA-based direct remote messaging mechanism can be applied to 
Electronic Commerce (EC). Mobile agents can migrate to many EC related hosts 
gathering information (e.g., lowest prices). If one mobile agent satisfy the request of 
owner, the mobile agent can transfer messages to other mobile agents to stop their 
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work. Another area is where requiring fast message delivery eapabilities. For exam- 
ple, Intrusion Detection System (IDS)[17] need real time notifications and responses. 
Our messaging system can be used in real-time IDS for fast collaboration and reac- 
tions of mobile agents. 

As future works, we are considering security enhanced messaging system. Agent 
security is an important issue in distributed agent environment. Because messages can 
be routed through several agent systems, message integrity and confidentiality should 
be provided by some cryptographic means[18]. 
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Abstract. In this paper, we present a Mobile Thread Programming 
Model (MTPM), a model to simulate the persistence of a migratory 
thread, to overcome the problem of coexistence of mobility, persistence 
and autonomy for mobile agents. An advantage of MTPM over other 
code mobility paradigms is that the model simulates strong mobility at 
the application-level rather than at the system-level as used in many 
strong mobility- supporting systems. It is runtime dependent to migrate 
threads at system-level. However, MTPM is constructed on Java Virtual 
Machine (JVM) by using Serialization and Remote Method Invocation 
(RMI), thus it is suitable to heterogeneous environments without 
introducing new spatial and time complexities in the implementation. 
Distributed Task Plan (DTP), which is detailed in this paper, is a flexible 
implementation model of MTPM used to simulate the persistence of an 
agent thread. Also, a DTP is embedded with navigational and 
computational autonomies, so that a mobile agent can obtain a 
continuous and autonomous workflow only by executing a DTP. 

1 Introduction 

The mobile agent is one of the promising technologies used to deal with the 
application challenges raised with the increasing growth and diffusion of network 
systems, especially the Internet. Different systems [5] [6] [9] [1 2] have been proposed 
to implement mobile agents, but few systems support autonomy of mobile agents that 
many WWW applications, such as mobile computing [12, 16], depend on. In the 
application of mobile computing, a user launches a mobile agent from a laptop that is 
connected to the Internet, then the user disconnects the laptop from the Internet. The 
mobile agent travels in the Internet autonomously, retrieving and updating 
information locally on behalf of its owner. Later, the mobile agent will return to the 
user’s laptop and report the results when the user’s laptop is reconnected to the 
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Internet. Mobile agents should have "intelligenee” of self-eontained navigation and 
computation, which give mobile agents the adaptation powers to the dynamic and 
heterogeneous networks, because in most cases mobile agents can not interact with 
their owners. 

There are two kinds of features that must be satisfied by mobile agents in the 
context of autonomy. They are the persistence of an agent thread and the self- 
containment in navigation and computation. Unfortunately, the elaborated coexistence 
of mobility, persistence and autonomy are difficult and not adequately modeled and 
supported by most existing mobile agent systems. This paper proposes a Mobile 
Thread Programming Model (MTPM) with its implementation model. Distributed 
Task Plan (DTP) [11]. MTPM is an application-level model to simulate the 
persistence of threads after an agent migration. MTPM deals with heterogeneity of 
agents’ execution environments by JVM without introducing any new spatial and time 
complexities in the implementation. DTP is a flexible implementation model of 
MTPM. DTP complies with the MTPM's programming paradigm and is embedded 
with navigational and computational autonomies. An agent plans its DTP when the 
agent is generated. When a DTP is executed by a mobile agent, the DTP generates 
continuous and autonomous workflows for the agent. 

This paper is organized as follows. In Section 2, we analyze features of agent 
mobility, and sum up the limitations of widely studied technologies, which are 
unsuitable for generating persistent and autonomous workflows for mobile agents. In 
Section 3, we propose a new model, MTPM, for agent migration and prove its 
correctness. In Section 4, we outline the foundation of the MTPM implementation by 
using technologies of Object Serialization [15] and Remote Method Invocation (RMI) 
[16]. In Section 5, we describe an implementation model, DTP, of MTPM. A DTP 
plans distributed tasks for mobile agents. The execution of a DTP generates 
continuous workflows with navigational and computational autonomies for mobile 
agents. In Section 6, we compare the effectiveness of MTPM to typically related 
works by analyzing many factors. Finally we present our conclusions and directions 
of future researches in Section 7. 

2 Problem Description on Agent Migration 
Mechanisms 

Generally speaking, there are two kinds of agent migration mechanisms to be 
distinguished. They are often called weak and strong mobility [3]. Weak mobility 
permits an agent to migrate only with its codes and values of variables. After 
migration, the agent is restarted and values of its variables are restored. But the 
agents’ execution starts from the beginning or from a special method rather than the 
stop point before agent migration. Weak mobility does not support the persistence of 
agent threads. Many mobile agent systems only support weak mobility of agents. 
They are Odyssey [5], Voyager [12], Java-To-Go [11], Aglets [9], Facile [17], 
Tocoma [8], Mole and Grasshopper [7] etc. Strong mobility permits the agent to 
migrate not only with codes but also with the whole state of thread execution. After 
migration an agent is restarted its execution exactly from the point where it was 
suspended before migration, so strong mobility supports the persistence of agent 
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threads. Some mobile agent systems support strong mobility of agents. They are 
Telescript, Agent Tel [6], Ara [13] and Sumatra [1] etc. 

In many weak mobility supporting systems, the mechanism behind the weak 
mobility is to program a mobile agent kernel with many different methods that will be 
executed by the agent at different network nodes. When an agent executes the mobile 
primitives for migration, the agent must explicitly provide a destination address and a 
method to be executed at that destination. On the other hand, strong mobility requires 
that the mobile agent server transparently and randomly captures the thread’s 
execution mapping of any agent, transports the captured mapping and restores the 
transported mapping after agent migration. 

The state of the art is that mobile agent systems with weak mobility have wide 
platform acceptances because they are often constructed by popular languages such as 
Java, but they suffer from the following limitations for programming autonomous 
mobile agents. 

1. Few procedures or primitives are provided for supporting agents’ autonomies 
in the mobility and the computation. Although it is possible, it is difficult to program 
a mobile agent with desirable autonomies. 

2. Their programming paradigms are not for workflow models [2], so they 
provide no inherent supporting for designing an autonomous agent. It is difficult for 
them to generate continuous workflows. 

3. A mobile agent and its distributed tasks are programmed in the same program 
unit (or class), so both reusability and flexibility are lost. A mobile agent can only 
execute a distributed task without revising its codes. 

Persistence is fundamental for the next-generation of agent-based applications 
[14]. Although current mobile agent systems with strong mobility are easy to expend 
for supporting autonomies for mobile agents, they are often constructed with special 
languages or they modify popular language’s specification such as JVM for 
facilitating the capture of an agent’s execution state. These prevent them from being 
widely accepted and used to build agent-based applications in multiple platforms. An 
evident example is that General Magic rewrites its mobile agent system. Telescript, 
into Odyssey by using Java, in order to be widely accepted. In addition, because 
threads are strongly bound to the runtime system, it is difficult for strong mobility at 
system-level implementation to deal with heterogeneous environments in which 
mobile agents roam. Also it is inefficient to implement persistence at system-level by 
capturing, transporting and restoring the execution state of the agent thread because 
an agent thread has huge information of execution stacks and heaps. It is reported in 
[4] that strong mobility is implemented at language-level, but [4] introduces extra 
time and space overhead at the same time. 

In the context of autonomy, agents must have two features, the persistence of 
agent thread and the self-containment in navigation and computation. Limitations of 
current agent migration technologies have made us design a new mobile agent system 
Mobile Agent Template (MAT) [10] by Java for supporting autonomous mobile 
agents. In MAT, we program mobile agents with MTPM paradigm. We pursue 
coexistence of persistence, mobility and autonomy with MTPM. Fully transparent 
migration is not a necessity. MTPM simulates strong mobility at an application-level 
using a lightweight implementation on JVM, so it is suitable for programming mobile 
agents to heterogeneous environments without introducing new spatial and time 
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complexities in the implementation. When it is generated, a mobile agent plans its 
DTP that is the implementation model of MTPM. The exeeution of a DTP generates 
continuous and autonomous workflows for mobile agents. MTPM does not need any 
modifieation of JVM, and it uses two new mechanisms. Serialization and RMI, 
provided by JVM. 

3 Mobile Thread Programming Model 



3.1 Persistence Simulation of a Mobile Agent's Thread 

In this seetion, we introduee MTPM, whieh is a model to simulate the persistence 
of migratory threads for mobile agents at application-level. This model depends on 
Serialization and RMI mechanisms of JVM. Writing an object state into a serialized 
form is suffieient to reeonstruet the objeet as it is read. Thus, writing and reading 
objeets are ealled objeet serialization and deserialization. A thread is ultimately code 
and data, and we suspeet that state can always be represented by data. Object 
serialization and deserialization are essential and enough to simulate a persistent state 
of an agent thread based on the following proposed theory. 



Definition 1\ The serialization operation on an objeet Obj is denoted as Ser(ObJ), and 
the deserialization operation on a serialized objeet Ser(Obj) is denoted as 
Deser(Ser( Obj ) ) . 



I Agent | 
Zone A 



Zone B 



Zone C 



Zone C 

fAgent' 

Server! 



public class Agent { 

public Integer entry_point; 



I Agent [-| 




_^onDispatch() { 



onArrivalO 

run() 



switch(entry_point) 
case 1 : 



AgentMobileTo(Destination, 2); 

case 2: 

}} 

►AgentMobileTo(Destination, agent); 
-agent.onDispatchO; 
TransferTo(Destination, agent); 



Local Host 



public class Agent { 

public Integer entry_point; 



switch(entry_point) 
easel : 




AgentTansferln(agent); 
agent.onArrivalO; 
new Thread(agent).start(); 

Remote Host 



Fig.1 The migration simulation of agent's thread by MTPM 
In order to simulate the state persistence of an agent's thread, the following three 
methods must be provided to a mobile agent. 

onDispatchQ: This method is called just before an agent migration. It performs 
serialization operations on every non-transient objeet Obj in Zone A (see Fig. 1) of an 
agent, i.e. for eaeh Obj in Zone A of an agent, perform Ser(Obj). 

onArrivalQ: This method is ealled just after a serialized agent objeet is 
transported to the destination by RMI. Contrary to onDispatchQ, it performs 
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deserialization operations on every serialized object Ser(Obj) of an agent, i.e. for each 
serialized object Ser(Obj) of an agent, ^Qxfoxm Deser(Ser(Obj)). 

run()\ This method is the running method of an agent thread. This method will be 
called when an agent thread is generated at the home machine or restored at a remote 
network node. In order to support the simulation of the state persistence of an agent’s 
thread, the run() method should use the switch entry _point paradigm as shown in 
Fig.l. The run() method consists of switch-case statements. Every mobile primitive is 
the last statement in a case branch, and a mobile primitive sets a new entry of the 
run() method that will be recalled at the next destination from the new entry. 

Having defined the above three methods, the migration simulation of an agent 
thread is also graphically illustrated in Fig.l. When an agent executes the mobile 
primitive such as AgnetMobileTo (Destination, 2), the mobile primitive sets a new 
entry entry _point as 2 and calls the method Agent MobileTo (Destination, this) of 
current Agent Server. The Agent Server calls back the agent’s method onDispatchQ so 
as to give an opportunity to the agent to serialize its objects. Then, current Agent 
Server calls its method TransferTo (Destination, agent) to transport the agent object to 
the Agent Server at the destination. In fact, current Agent Server calls the remote 
method Transferln(agent) of the Agent Server at the destination by RMI. RMI 
permits to transfer an object reference graph as a parameter to a remote method, so in 
fact, current Agent Server transports the agent in the form of a serialized object to the 
destination. The first thing of Transferin is to call back the agent’s method onArrival() 
to deserialize the agent’s serialized objects, then generate a new thread to execute the 
agent. The agent will be executed from the statement case 2 when its method run() is 
recalled. 



3.2 Proof of MTPM’s Correctness 

Generally, objects that are generated by an agent are in Zone A, Zone B or Zone 
C. Objects in Zone A can be persisted by Java Serialization because they are class- 
level variables. However, objects in Zone B or Zone C can not be persisted by Java 
Serialization because local variables of a method are located in the method call stack 
of JVM and can not be reached by Java Serialization. But according to Object- 
oriented paradigm, any object that is generated in a method is local and transient, so 
any persistent information of an agent does not depend on objects in Zone B or Zone 
C. In addition, any object in Zone B, which may be used in following case branches, 
is regenerated when the method run() is recalled; Objects in Zone C do not depend on 
each other if they are in different case branches. Summarizing the above features in 
the proposed paradigm of agent design, we have the following theorem, which proves 
that the persistence of agent thread can be simulated by MTPM. 

Axiom 1: For any object Obj, Obj is equal to Deser(Ser(Obj)). 

Axiom 2\ The execution state of an agent’s thread is only determined by both the 
states of the agent’s objects in Zone A and the execution point of the method run() in 
MTPM paradigm. 

Theorem 1 : The persistence of an agent thread can be simulated by MTPM during the 
agent migration. 

Proof. We must define entry _point as a member variable of a mobile agent class 
because Serialization can not capture any local variable of a method. Suppose an 
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agent executes a mobile primitive AgentMobileTo (Destination, k), which is the 
number k-1 statement of the agent’s method run(), then the execution stop-point k is 
stored in the object entry _point of the agent. Also suppose the agent has valid objects 

Obji, Ob] 2 , , Objn (of course including the object entry _point) in Zone A, then the 

method onDispatch() of the agent will serialize all the objects, i.e. performs Ser(Objj), 

Ser(Obj 2 ), , Ser(ObjJ when the method is called just dAtQX AgentMobileTo. When 

the agent object is transported to the destination, its method onArrival() is called. The 
method onArrival() deserializes all the serialized objects, i.e. performs 

Deser(Ser(Obji)), Deser(Ser(Obj 2 )), , Deser(Ser(ObJn)). From Axiom 7, Obji is 

equal to Deser(Ser(ObJi), where i belongs to {1, 2, n},^o states of all the objects in 

Zone A of the agent (of course including object entry _point) are persistent (a) 

When the agent is restarted at the destination, its thread’s execution method, 
run(), is called. All the objects in Zone B will be regenerated and the method run() 
will execute from the case statement that is determined by object entry _point. 
Because the object entry _piont is k, the stopped execution point is restored from the 

statement k after the agent migration has been completed (b) 

From (a) and (b), the execution state of the agent’s thread is persistent after the 
agent migration according to Axiom 2. 

4 Foundation of MTPM Implementation 

Our agents need persistence, which is the ability of an object to record its 
execution state so the state can be reproduced in other environments. With the release 
of Javal.l, the Java community has gained access to a wide variety of features. 
Important features, which contribute to the implementation of MTPM, are object 
Serialization and RMI. Combinations of these make it possible to simulate persistence 
of an agent’s thread with object persistence. 

Object Serialization provides a program with the ability to read or write a whole 
object to and from a raw byte stream. It allows objects and primitives to be encoded 
into a byte stream suitable for streaming to some type of network or to a file-system, 
or more generally, to a transmission medium or storage facility. The real power of 
object Serialization is the ability of programs to easily read and write entire objects 
and primitive data types, without converting to/from raw bytes or parsing clumsy text 
data. Object Serialization has taken a step in the direction of being able to store 
objects instead of reading and writing their state in some foreign and possibly 
unfriendly format. In order to be persistent, the class definition of a mobile agent 
should implement the Serializable interface. We can customize serialization for an 
agent by rewriting and providing two methods writeObject and readObject to the 
agent. The two methods in agent implementation are functional equivalents to 
onDispatchO and onArrival() in MTPM. The process of serializing an object involves 
traversing the graph created by each object’s references to other objects and 
primitives. So all the objects including agent object and objects that can be reachable 
by references of the agent are preserved during agent Serialization. 

RMI enables a program running on a client computer to make method calls on an 
object located on a remote server machine. Object-oriented design requires that every 
task be executed by the object most appropriate to that task. RMI takes this concept 
one step further by allowing a task to be performed on the machine most appropriate 
to the task. A client can invoke the methods of a remote object with the same syntax 
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that it uses to invoke methods on a loeal objeet. RMI has several advantages over 
traditional Remote Proeedure Call (PRC). RMI can pass full objects as arguments and 
return values. This means that we can pass complex types such as an agent object as a 
single argument without extra converting codes. Passing objects lets us use full power 
of object-oriented technology in agent migration. When passing an object as an 
argument, RMI moves class implementations of the object at the same time. At this 
point, RMI moves behavior from a client to a server or a server to a client, so we can 
benefit from fully object-oriented patterns for agent design. In addition, RMI uses 
built-in Java security mechanisms that allow the agent system to be safe when moving 
agents. Customized security mechanisms are easily integrated into agent system with 
RMI security Model such as specifying Security Manager in Java 1.1 or Policy File in 
Java 1.2. With RMI we can write a mobile agent system in the simplest form like this: 

import java. rnrii.*; 

public interface AgentServer extends Remote { 

void TransferTo (String Destination, Agent agent) 

throws RemoteException, InvalidAgentException; 
void Transferln(Agent agent) 

throws RemoteException, InvalidAgentException; 

AgentServer getRemoteAgentServer(String Destination) 

throws RemoteException, AgentServerNotFoundException; 
void registerAgentServer(String agentServerName) 

throws RemoteException, AgentNameInvalidException; 
void unregisterAgentServer(String agentServerName) 

throws RemoteException, EntryNotFoundException; 
void createAgent(String agentClassName, Class parameterTypes[], Object initargsQ) 

throws RemoteException, InvalidAgentException; 

} 

import java. io. Serializable; 

pulblic interface Agent extends Serializable { 
void onDispatchO; 
void onArrivalO; 
void run(); } 

In an agent system, an Agent Server is a remote object to which other Agent 
Servers in the system have references. Transporting an agent would be a matter of 
creating a class that implemented the Agent interface, finding a server, and invoking 
its Transferin method with the agent object as an argument. The implementation for 
the agent would be transported to the server and run there. We don not have to write 
the two methods of onDispatchQ and onArrivalQ if we would like to perform default 
serialization for a mobile agent by RMI. After deserializing the agent, the 
TransferlnQ method will start up a new thread for the agent and invoke its run() 
method. 

5 DTP: A Flexible Implementation of MTPM 

In fact, in the implementation of MTPM, we do not program all the tasks to be 
executed by an agent in its run() method because this kind of design can not support 
flexibility, reusability and workflow mode. Instead, we use a flexible implementation. 
Distributed Task Plan (DTP), to support continuous and autonomous workflows for 
mobile agents. 
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5.1 Architecture of DTP 

In this section, we introduce Distributed Task Plan, which is a flexible 
implementation of MTPM for generating continuous and autonomous workflows for 
mobile agents. In MAT, we define two kinds of autonomies for mobile agents. 

Definition!: Mobile autonomy is the capability of self-navigation of mobile agents 
through the underlying network. 

Definition 3: Computational autonomy is the capability of self-containment of mobile 
agents in computational functions for the accomplishment of a distributed task. 

In order to obtain desirable autonomies for mobile agents, firstly, we provide 
enough programming components called autonomous primitives, which are used to 
construct a DTP and further to provide mobile agents with autonomies in the 
navigation and the computation. Four kinds of autonomous primitives are defined in 
MAT for DTP designing. 

1 . Mobile primitives define the mobility of an agent. A mobile agent can merely 
transport itself to the next destination from the current network node by calling a 
simple migration primitive, or clone and transport each of its duplicates to different 
destinations by calling a multiple migration primitive. 

2. Computational primitives define invocations of computational resources. A 
computational primitive specifies where to find the current computational procedure, 
how to load it and how to execute it. By using the computational primitives, a mobile 
agent realizes that (a) the current computational procedure is carried by the agent or is 
resident at the visiting node; (b) the procedure should be started in a different process 
or loaded into its own process; and (c) how to run the computational procedure such 
as synchronously or asynchronously. 

3. Solution synthesis primitives define the combination of multiple solutions 
from different mobile agents. The solution synthesis is needed when a task is divided 
into several subtasks and executed by different mobile agents concurrently. It is 
highly efficient to divide a task into several subtasks and to assign these subtasks to 
different mobile agents for the executions when the task can be executed concurrently 
and multiple resources are available. 

4. Control primitives define the execution flow of mobile primitives, 
computational primitives and solution synthesis primitives. Enough control structures 
in control primitives are needed to efficiently coordinate the executions of all the 
above three primitives. 

Having defined primitives, we provide reasonable model to design DTP, which 
depicts distributed tasks for mobile agents by advantages of those pre-defmed 
autonomous primitives. Normally, a DTP is composed of all the four kinds of 
autonomous primitives. 

Definition 4: A Distributed Task Plan (DTP) is a static description of a distributed 
task, which is to be executed by a mobile agent. 

A DTP consists of primitives, which are arranged into two lists. A list, which we 
prefer to call a control queue (C0, only contains control primitives, and the other list, 
which we prefer to call a reusable primitive list (RPL), contains any primitives except 
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control primitives. The architecture of a DTP is graphically illustrated in Fig. 2 
(concrete meanings of primitives of Fig. 2 are defined in [1 1] ). 




When being generated, a mobile agent plans its own DTP for the execution of a 
distributed task satisfying a user’s requirements. The planning includes Objective 
Matching, Primitive Selection and DTP Generation by using the user’s requirements, 
network state information and task features. A mobile agent can also replan its DTP 
when current DTP fails during the execution. The planning of a DTP is detailed 
in [11]. 

Definition 5: An execution of a DTP by a mobile agent in a dynamic network 
environment is a continuous and autonomous workflow of the mobile agent. 




Fig. 3 The continuous & autonomous workflow generating by DTP 
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A mobile agent has a referenee to its DTP, and a DTP has referenee to a CQ. All 
the objects, from the agent itself, DTP to autonomous primitives, have a mn() 
method. Every object’s run() method just calls the run() method of another object to 
which the former has a reference. The autonomous workflow of a mobile agent is 
generated when the mobile agent executes its DTP by calling its run() method as 
shown in Fig. 3. 

The order of primitives in a CQ is important. The control primitives in a CQ are 
executed sequentially. A control primitive in a CQ has one or more references to 
primitives in a RPL corresponding to which type the control primitive is. A reference 
of a control primitive in a CQ depicts a possible invocation to a primitive in a RPL. 
The order of primitives in a RPL is not important because the invocations to them are 
determined only by references of control primitives in a CQ. A RPL is just a 
repository of autonomous primitives that a mobile agent may need to execute when 
transporting in underlying networks. So a mobile agent only executes control 
primitives in a CQ one by one, then further executes primitives in a RPL. An 
execution of a DTP is a continuous and autonomous workflow of a mobile agent. 
Constructing a complex workflow by using DTP provides a mobile agent with 
autonomy, flexibility and reusability in distributed applications. 

6 Related Work 

To our knowledge, providing transparent migration for agents at language-level 
is done in [4] [18], and providing mobility, persistence and autonomy for agents at the 
same time is done in very few models besides our model. To capture the state of an 
agent for fully transparent migration, [4] has developed a preprocessor that 
instruments the programmers' Java codes by adding codes. Those added codes do the 
actual state capturing, and reestablish the state on restart at the target machine. [4] 
does this instrumentation by parsing the original program code using a Java based 
parser. In fact, what is done by [4] is a mechanical transformation of codes written for 
transparent migration into codes written for non-transparent migration. [4] has to deal 
with complex problems, such as saving and rebuilding local variables, objects and the 
method call stack, but to leave thread synchronization to programmers. In [18], a self- 
migration computation is separated into two layers. The computational layer consists 
of an arbitrary collection of functions distributed throughout the system, and the 
coordination layer deals primarily with the locations at which various functions are to 
be executed and the communication among functions. In [18], the programmers' 
original script must satisfy the following three conditions for facilitating the 
transformation of the original script into a pseudo code script that supports transparent 
migration: 1. The original script consists of only function calls; 2. All functions are 
numbered and each knows its possible successors; and 3. Any statement that may 
cause a context switch may execute only as the last statement of a function. Tab.l 
compares some important features of [4], [18] and MTPM. 

MTPM provides a lightweight mechanism that is functionally equivalent to 
transparent migration. MTPM does not introduce any extra time and space overhead 
as in [4], but only has the restriction as the third one of [18]. MTPM generates 
persistent and autonomous workflows at minimum time and space costs, and 
restrictions in programming paradigm. 
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Features 


[4] 


[18] 


MTPM 


Location of 
Mobile Instruction 


anywhere 


restricted(more) 


restricted(less) 


Transparent Migration 


yes 


yes 


functionally 

equivalent 


Preprocessing 


yes 


no 


no 


Extra Time Overhead 


4%~19% 


none 


none 


Blow-up Factor 
of Bytecode 


3.4~4.7(times) 


none 


none 


Autonomy 


none 


less 


more 


Platform Independence 


yes 


no 


yes 



Tab.l Comparison of [4], [18] andMTPM 



7 Conclusion 

Many WWW applications such as mobile computing depend on autonomies of 
mobile agents. The threads of mobile agents should be continuous and autonomous 
workflows; i.e. the persistence and autonomy of thread of an agent are two basic 
features of an autonomous mobile agent. It is difficult, inefficient and runtime 
dependent to support thread persistence at system-level. In the eontext of autonomy 
and heterogeneity, the widely used code migration mechanisms provide no inherent 
support for the design of mobile agents. Thus, we have proposed and proved a model, 
MTPM that is suitable for designing a workflow of mobile agents. MTPM simulates 
the state persistence of thread of an agent by Serialization and RMI without 
introducing any new spatial complexity in the implementation. 

DTP is a flexible implementation of MTPM. DTP eomplies with the 
programming paradigm defined by MTPM, so a DTP generates a continuous 
workflow when a mobile agent executes it. Because a DTP is composed of 
autonomous primitives, a DTP embeds some degree of autonomy or ’’intelligence” 
into mobile agents. Using a DTP, navigational and computational autonomies are 
carried by mobile agents as they transport through the underlying computational 
networks. A mobile agent can freely transport and use many different computational 
resources in a heterogeneous network by exeeuting autonomous primitives in a DTP 
without interaction with its owner. 

For supporting the eoexistence of persistence, mobility and autonomy, we have 
presented a basie framework, MTPM, with its implementation model, DTP, in this 
paper. Our future work will focus on investigating the suitability of MTPM and DTP 
in WWW applications, such as Internet information retrieval, electronie commerce 
and Computer Support Cooperative Work (CSCW). From feedbaek of the 
investigations, we can find problems in MTPM and DTP, and make improvements in 
both the model and its implementation. 
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Abstract. A Distributed Vision System (DVS) is an infrastructure for 
mobile robot navigation. The system consists of vision agents embedded 
in an environment and connected with a computer network, observes 
events in the environment, and provides various information to robots. 
We have developed a prototype of the DVS which consists of sixteen 
vision agents and simultaneously navigates two robots. This paper de- 
scribes the detail of the prototype system, shows the robustness through 
experimentation, and considers problems in applying a multi-agent sys- 
tem to a real robot system through development of the DVS. 



1 Introduction 

For limited environments such as offices and factories, several types of au- 
tonomous robots which behave based on visual information have been developed. 
However, it is still hard to realize autonomous robots behaving in dynamically 
changing real worlds such as an outdoor environment. 

As discussed in Active Vision [1], the main reason lies in attention control to 
select viewing points according to various events relating to the robot. In order 
to simultaneously execute various vision tasks such as detecting free regions 
and obstacles in a complex environment, the robot needs attention control, with 
which the robot can change its gazing direction and collect information according 
to various events. If the robot has a single vision, it needs to change several 
vision tasks in a time slicing manner in order to simultaneously execute them 
{Temporal Attention Control). Furthermore, in order to execute various vision 
tasks the robot needs to select the best viewing point according to the vision 
tasks {Spatial Attention Control). 

However, it is difficult with current technologies to realize the attention con- 
trol. The following reasons can be considered: 

— Vision systems of previous mobile robots are fixed on the mobile platforms 
and it is difficult to acquire visual information from proper viewing points. 

— For a single robot, it is difficult to acquire a consistent model of a wide 
dynamic environment and to maintain it. 

In order to simultaneously execute the vision tasks, an autonomous robot needs 
to change its visual attention. The robot, generally, has a single vision sensor 
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Fig. 1. Distributed Vision System 



and a single body, therefore the robot needs to make complex plans to execute 
the vision tasks with the single vision sensor. 

Our idea to solve the problem is to use many vision agents (VAs) embedded 
in the environment and connected with a computer network (See Figure 1). Each 
VA independently observes events in the local environment and communicates 
with other VAs through the computer network. Since the VAs do not have any 
constraints in the mechanism like autonomous robots, we can install a sufficient 
number of VAs according to tasks, and the robots can acquire necessary visual 
information from various viewing points. As a new concept to generalize the 
idea, we propose Distributed Vision that multiple vision agents embedded in an 
environment recognize dynamic events by communicating with each other. In 
the distributed vision, the attention control problems are dealt as dynamic or- 
ganization problems of communication between the vision agents. The detail of 
the concept of the distributed vision is discussed in [2]. Based on the concept, we 
have developed a Distributed Vision System (DVS) [2], which solves the above 
problems and realizes robust navigation of mobile robots in a complex and dy- 
namic environment such as outdoor environments. The DVS consists of VAs, 
solves the attention control by selecting proper VAs, and realizes navigation of 
mobile robots based on visual information in a complex environment. 

In this paper, development of a prototype of the DVS is reported. In devel- 
oping the DVS, the following points are important: 



1. Navigation method of multiple robots using multiple VAs 

2. Communication among VAs 

3. Construction and management of environment models in the VA network 



In this paper, we mainly discuss 1. In the following, the architecture of the 
prototype system and the navigation method of robots are described. Finally, 
experimental results of navigation are shown. 
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2 Distributed Vision System 

2.1 Design Policies for the DVS 

The VAs, which the DVS consists of, are designed based on the following idea: 

Tasks of robots are closely related to local environments. 

For example, when a mobile robot executes a task of approaching a target, the 
task is closely related to a local area where the target locates. This idea allows 
to give VAs specific knowledge for recognizing the local environment, therefore 
each VA has a simple but robust information processing capability. 

More concretely, the VAs can easily detect dynamic events since they are fixed 
in the environment. A vision-guided mobile robot the camera of which is fixed 
on the body has to move for exploring the environment, therefore there exists a 
difficult problem to recognize the environment through the moving camera. On 
the other hand, the VA in the DVS easily analyzes the image data and detects 
moving objects by constructing the background image for the fixed viewing point. 

All of the VAs, basically, have the following common visual functions: 

— Detecting moving objects by constructing the background image and com- 
paring the current image with it. 

— Tracking detected objects by a template matching method. 

— Identifying mobile robots based on given models. 

— Finding relations between moving objects and static objects in the images. 

The DVS, which does not keep the precise camera positions for robustness 
and flexibility, autonomously and locally calibrates the camera parameters with 
local coordinate systems according to demand (the detail is discussed in Sec- 
tion 3.3). In addition, the DVS does not use a geometrical map in robot nav- 
igation. It memorizes robot tasks directly taught by a human operator, then 
navigates robots based on the memorized tasks. 



2.2 The Architecture 

Figure 2 shows the architecture of the DVS for robot navigation. The system 
consists of multiple VAs, robots, and a computer network connecting them. 

Image processor detects moving robots and tracks them by referring to 
Knowledge database which stores visual features of robots. Estimator receives the 
results and estimates camera parameters for establishing representation frames 
for sharing robot motion plans with other VAs. Task manager memorizes the 
trajectories of robots as tasks taught by a human operator, and selects proper 
tasks in the memory in order to navigate robots. Planner plans robot actions 
based on the memorized tasks and the estimated camera parameter. Organiza- 
tion manager communicates with other VAs through Communicator and selects 
proper plans. The selected plans are memorized in Memory of organizations for 
planning robot tasks more properly. Controller controls the modules, according 
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Fig. 2. The architecture of the DVS 



to requests from the robots and the system state such as teaching phase and 
navigation phase. 

The robot receives the plans through Communicator. Command Integrator 
selects and integrates the plans, from which actuator commands are generated. 

3 Development of a Prototype System 

We have developed a prototype system for robot navigation based on the archi- 
tecture described in the previous section. The system consists of sixteen VAs, and 
simultaneously navigates two robots. In the DVS, robot navigation is achieved 
with two steps — task teaching phase and navigation phase. In the task teaching 
phase, each VA observes and memorizes the path of a robot which is controlled 
by a human operator or autonomously moves in the environment. In the navi- 
gation phase, VAs communicate with each other, select VAs which gives proper 
visual information for navigation, and navigate robots based on the paths mem- 
orized as trajectories in each sensor image. In the following, the details of the 
system are described. 



3.1 Task Teaching 

The system needs the following functions in order to navigate robots: 

1. Navigate robots on free regions where the robots can move. 

2. Avoid a collision with other robots and obstacles. 

3. Navigate robots to their destinations. 
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Background images Detected free regions 

Fig. 3. Detecting free regions 



In the DVS, these functions are realized by using information from VAs. The 
function 1 is realized as follows. The knowledge of the free regions are obtained 
if VAs observe moving objects in the environment for a long time, assuming that 
the regions where the objects move around are free regions. In the developed sys- 
tem, VAs observe a robot used for task teaching in the teaching phase (described 
below), and recognize free regions (see Figure 3). The function 2 is realized by 
generating a navigation plan so as to avoid a collision when they estimate the 
collision of a robot with other robots or obstacles. In the developed system, VAs 
check the paths of robots, and if a collision is expected, VAs temporarily cor- 
rect the destination of the navigation plan in order to avoid the collision. Touch 
sensors on the robots are also used since VAs cannot acquire proper information 
needed to avoid a collision in the case where the robots are close to each other. 
The function 3 is realized by teaching a knowledge of paths for navigating robots 
to their destinations. In the following, the task teaching phase is described. 

The system switches into the task teaching phase with instruction of a human 
operator. In the task teaching phase, VAs memorize tasks shown by a human 
operator. The task consists of several subtasks, which are, in this experimenta- 
tion, movement from an intersection to another intersection. By connecting the 
subtasks, the system navigates robots to their destination. 

First, VAs detect robots in each sensor image. Since the VAs are fixed in an 
environment, they can easily detect objects by comparing the current image with 
the background image stored in advance. Then, robots are distinguished from 
other objects and identified by their colors, in this experimentation. After a robot 
has been detected, each VA tracks the robot, which is controlled by a human 
operator, and memorizes the trajectory as a task. When the robot passes over 
a specific place (e.g., in front of a building), the operator notifies the meaning 
of the place to the system. The system divides the tasks taught by a human 
operator into several subtasks, which are movement between intersections. In 
this experimentation, the subtasks are directly taught by a human operator in 
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Fig. 4. Overlaps of VAs’ visual fields 



order to simplify the experimentation, since the experimental environment is 
small and there are only two intersections. 

The VAs can more robustly detect robots by redundant observation. In Fig- 
ure 4, for example, the visual fields of VAl and VA4, and those of VAS and 
VA4 do not overlap. Therefore, if all VAs observe a robot with the same visual 
features, it is estimated that VA4 observes a robot different from other VAs’. 
In the developed system, the overlaps are acquired by simultaneously observ- 
ing a robot used in the task teaching phase from all VAs, then they are used 
in the navigation phase. In addition, the VAs can robustly detect robots using 
knowledge of local environments. For example, assuming that robots exist only 
on free regions, the VAs detect only robots whose bottom is on the free region, 
examples of which are shown in Figure 3. 

3.2 Navigation of Mobile Robots 

After the task teaching phase, the DVS navigates robots in the environment by 
iterating the following process (see Figure 5): 

1. A robot sends a request to the DVS to navigate itself to a destination. 

2. VAs in the DVS communicate with each other and determine paths to nav- 
igate the robot. 

3. Each VA sets a navigation target near the robot in the VA’s view, then 
generates a navigation plan and sends it to the robot. 

4. The robot receives the navigation plans from the VAs, then selects proper 
ones, integrates them, and moves based on the integrated plan. 

The details are described below. 

In order to generate a navigation plan for a robot, VAs have to identify the 
robot in their views. Here, the VAs identify the robot in the same way as the 
teaching phase. 

Next, each VA generates a navigation plan. First, the VA estimates the near- 
est point on the memorized paths from the robot in its view (see Figure 6(1)). 
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Fig. 5. Navigation of mobile robots 
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Fig. 6. Generating a navigation plan 



Then, the VA sets a navigation target at a certain distance At from the esti- 
mated position (see Figure 6(2)), computes an apparent angle Qi between the 
direction to the target and the current motion direction of the robot (see Fig- 
ure 6(3) and (4)), and transforms it into an angle represented on the plane on 
which the robot moves as follows: 



^ sin ai 

where oii is the tilt angle of the vision sensor of VA z, which is automatically 
estimated by observation (see Section 3.3). Each VA sends to the robot as a 
navigation plan. When a VA set a navigation target and detected an obstacle 
between the target and the robot, the VA corrects the target so as to avoid the 
obstacle. VAs also correct the navigation target in order to avoid a collision of a 
robot with other robots by regarding the other robots as obstacles. 
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After the robot received the navigation plans (^*) from VAs, the robot elimi- 
nates navigation plans which include large errors, in order to select proper plans 
for navigation, in other words, in order to select plans which are generated by 
VAs observing the robot from proper viewing points. In order to select proper 
plans, the robot estimates the error of The error of (9* is caused by an obser- 
vation error of the motion direction of the robot, and an estimation error of ai 
(the estimated tilt angle of VA i). Here, we assume that the former error, i.e., the 
observation error of the motion direction of the robot, is inversely proportional 
to the apparent robot size Wi in the view of VA i. The latter error (let this be 
which is related to the error of the estimated tilt angle is computed 
from equation ( 1 ) as follows: 



AO^ = 



0,. 



0 ,. 



sin((a^ + Aai) sina^ 



(2) 



where Aai is the estimated error of ai. Consequently, improper navigation plans 
are eliminated as follows. If the robot size in a VA’s view is less than 2/3 of the 
largest robots observed by VAs, it is assumed that the navigation plan generated 
by the VA includes a relatively large error compared to other navigation plans, 
and the plan is eliminated. Furthermore, after eliminating several navigation 
plans, the robot also selects navigation plans the estimated errors of which (i.e., 
AO^l ) are more than twice of the smallest of all. 

Next, the robot integrates the remaining navigation plans. Since the naviga- 
tion plans are represented with angles on a common coordinate system along the 
motion direction of the robot, they can be integrated by computing an average 
angle of them. Here, the robot computes an average angle of weighted with 
the estimated error AO^ as follows: 



6 >* 



t: he* 

Y!ki ’ 




(3) 



where wi is an apparent size of the robot in the view of VA i, and AO^ is the 
estimated error of O* related to Aa^ (the estimated error of a^). Finally, the 
robot generates actuator commands from 



3.3 Estimating Camera Parameters by Observation 

In general, the position of a vision sensor is represented with six parameters: 
rotation and translation parameters. Hosoda and Asada [3] proposed a method 
for estimating the camera parameters by visual feedback. In the DVS, each cam- 
era observes moving robots in an environment in order to estimate the camera 
parameters in the same way. However, the bottom of the robot is regarded as 
its position in the DVS, so that the robot position measured by observation is 
imprecise and it is difficult to estimate all six camera parameters. Therefore, 
three parameters f3i and 7 ^ as shown in Figure 7 are estimated in an on-line 
manner, which are needed for robot navigation if orthographic projection is as- 
sumed. By using these parameters, the differential angle Oi represented in the 
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Fig. 7. Estimating camera parameters 



view of VA i is transformed into the navigation plan represented on the plane 
on which the robot moves. 



Estimation method Let x and y be reference axes of rectangular coordinates, 
where the direction of the x axis indicates the motion direction of the robot, and 
let Of, (3i and 7 f be the tilt angle of VA i, the angle between VA i and the y 
axis, and the rotation angle around the viewing direction of VA i, respectively. 
Assuming orthographic projection, the velocity of the robot V is projected into 
the view of VA i as follows: 



V, = S,T,R,V 



(4) 



where the vector Vi = {ui, Vi)^ is the velocity projected in the view of VA i, 
Ri, Si represent rotation matrices of the angle Pi and —ji, respectively, and Ti 
represents a matrix of orthographic projection: 



Ri = 

S^ = 



cos Pi - 


- sm Pi \ 


(5) 


sin/3f 


cos Pi J 


cos7f 


sin7f \ 


(6) 


- sin 7 f 


cos7f J 


1 0 
0 sin a. 


) 


(7) 



Hence, the velocity V is represented as follows using Vi = {ui, ViY’: 



V = R-^T7^S7^Vi 



/ cos Pi 
V - sin Pi 



sin j3j \ 
sin oci \ 
cos Pi j 
sin ai / 




( 8 ) 



where u[ and v'- are: 
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= 



Therefore, 



cos7i 
sin 7 ^ 



V" = u? + 



- sm 7 i 
cos 7 ^ 



(9) 



( 10 ) 



^sm ai ^ 

If a human operator controls the robot with a constant speed, |V| is a known 
value. Consequently, ai can be computed from the following equation: 



sinctj = y2^"_ ^,2 (^DO) (11) 

Furthermore, the component y of the velocity V is always zero, so that Pi can 
be computed from equation ( 8 ) as follows: 

u'- sin Pi — cos Pi = 0 ( 12 ) 

sin ai 

By observing two velocities of a robot (i.e., observing two different V^), (two 
different) Pi and 7 ^ are acquired based on equations (9), (11) and (12). In this 
experimentation, however, in order to simplify the estimation, we assume 7 ^ = 0 , 
that is, the cameras are set up in even with the plane where robots move. By 
this assumption, ai and Pi are computed with one observation from equations 
(11) and ( 12 ), respectively. Note that, in practice, the velocity of the robot Vi 
is normalized with Wi (the size of the robot in the view of VA i). 

Estimation error The relation between an observation error of the robot ve- 
locity {Aui, Avi) and an estimation error of the tilt angle Aai is computed from 
equation ( 11 ) as follows: 



Aai = sin 



{vj + AviY \ 
- {Ui + AuiY J 



(13) 



where we assume 7 ^ = 0. Figure 8 shows Aai when ai = 30°, and Ax and Ay 
are 1%, 5% and 10% of |V|. In Figure 8 , the horizontal axis is represented with 
Pi since Ui and Vi are determined by equations (11), (12), and Pi. Thus, the 
estimation error Aai becomes larger when Pi approaches zero, that is, when the 
velocity of the robot approaches the horizontal direction in the view of VA i. 
Note that Aai is used in equations ( 2 ) and (3) for integrating navigation plans 
generated by multiple VAs. 

Table 1 shows an example of the tilt angles a (in degrees) of four VAs esti- 
mated by observing two kinds of robot motions as shown in Figure 9. Comparing 
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Table 1. ai of vision agents estimated by observation 



VA 


VAl 


VA2 


VA3 


VA4 


Actual 


30 


31 


9 


28 


Observation 1 


21.7* 


35.8 


30.4* 6.49* 


Observation 2 


24.9 


8.08* 16.9* 


34.4 



with the actual angles (indicated as ‘Actual’ in Table 1), the estimation error 
becomes larger (values denoted with in Table 1) when the velocity of the 
robot approaches the horizontal direction as discussed above. The estimated pa- 
rameters are not exactly precise, however, the DVS can still navigate robots with 
the estimated parameters though the robots wind their way. This is because the 
navigation plan is represented with a differential angle between a current motion 
direction of the robot and a direction to a temporal navigation target, and the 
direction (right or left) in which the robot is navigated is not affected by Aai. 
Furthermore, the DVS can successfully navigate robots by integrating navigation 
plans generated by multiple VAs. 

4 Experimentation 

We have constructed a prototype system. Figure 10 shows a model town and 
mobile robots used in the experimentation. The model town, the scale of which is 
1/12, has been made for representing enough realities of an outdoor environment, 
such as shadows, textures of trees, lawns and houses. Sixteen VAs have been 
established in the model town and used for navigating two mobile robots. 
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VAl VA2 VA3 VA4 



Observation 1 



Observation 2 




Fig. 9. Images used for estimating ai 




Fig. 10. Robots navigated by the DVS 



Figure 11 shows the hardware configuration. Images taken by the VAs are 
sent to image encoders (quadrant units) which integrates sixteen images into one 
image, then sent to a color frame grabber. The size of the whole image is 640 x480 
pixels, therefore each VA’s image is 160x120 pixels. The main computer. Sun 
Sparc Station 10, executes sixteen VA modules, which process data from the color 
frame grabber at 5 frames per second, and communicate with the two robots 
through serial devices. The robots avoid a collision based on VA’s commands, 
however, if a collision is detected with their touch sensors, they move backward 
and change the direction in order to avoid the collision. 

First, a human operator shows VAs two robots and each VA memorizes their 
colors (red and black) in order to distinguish them. Then, the operator teaches 
several paths by using one of the robots. Finally, the system simultaneously 
navigates the robots along the taught paths. Figure 12 shows images taken by 
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Quadrant units Sun S S 1 0 




Fig. 11. Hardware configuration 



VAs in the navigation phase. The vertical axis and the horizontal axis indicate 
the time stamp and the ID numbers of the VAs, respectively. The solid rectangles 
and the broken rectangles indicate selected VAs for navigating the red robot 
and the black robot, respectively. As shown in Figure 12, VAs are dynamically 
selected according to navigation tasks. That is, the system navigates robots 
observing them from proper viewing points. 

Figure 10 shows robot trajectories navigated by the DVS exhibited at the 
international conference IJCAFOT. The sixteen cameras were set up so as to 
cover the whole environment. Although their locations were not measured, the 
system continuously navigated the robots for three days during the exhibition. 
The concept of the DVS, such as simple vision functions, flexible navigation 
strategies and redundant visual information, realizes robust navigation in such 
a complex environment. 

5 Discussion and Conclusion 

We have developed a prototype of the DVS. In this paper, we mainly described 
the details of the navigation method of mobile robots using multiple VAs. In 
addition, the prototype system partly deals with communication among VAs 
and robots and construction and management of environment models, which 
represent relations between VAs and robots through navigation tasks. With the 
experimental result of robot navigation, we have confirmed that the DVS can 
robustly navigate mobile robots in a complex world. 

In distributed artificial intelligence, several fundamental works such as Dis- 
tributed Vehicle Monitoring Testbed (DVMT) [4] and Partial Global Planning 
(PGP) [5] dealing with systems using multiple sensors have been reported. In 
these systems, which are based on the blackboard model [6], agents symbolize 
sensory information with a common representation, and gradually proceed with 
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their recognition by exchanging them. Thus, these systems deal with recognition 
based on symbolized information. On the other hand, the purpose of the DVS 
is to navigate robots. In the teaching phase, the VAs independently memorize 
paths as robot tasks from their own viewing points without symbolizing them. 
In the navigation phase, the VAs plan a global path of a robot by communi- 
cating with each other, generate instant navigation plans, and finally the robot 
generates an instant actuator command from the plans. Thus, the DVS deals 
with motion recognition by multiple agents, and regeneration of the robot tasks 
by cooperation of the agents. 

As a future work, more detailed communication among VAs should be con- 
sidered. In the experimentation in Section 4, a human operator controls a robot 
to show robot tasks while directly indicating specific places (e.g., intersections), 
and the VAs learn the tasks by observing the controlled robot motion. In this 
process, if the VAs identify the specific places by themselves to lean subtasks 
(i.e., movement between intersections) autonomously, the VAs need to commu- 
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nicate with each other in order to construct and maintain a consistent environ- 
mental model. In addition, in the navigation phase, the VAs communicate with 
each other to make plans for navigating robots to their destinations. However, 
in the experimentation, the communication is simplified and specialized for the 
small experimental environment. For real world applications, more sophisticated 
communication will be needed in order to perform flexible planning by VAs’ 
communications. 

Furthermore, the following problems should be considered for extending the 
scale of the DVS: 

— More accurate identification of multiple robots 

— Dynamic organization for navigating many robots by a limited number of 
VAs 

With respect to the first point, the DVS does not suppose a geometrical map in 
order to keep robustness and flexibility of the system. Instead, it will be achieved 
by considering relations between robot commands and actual robot movement 
observed by VAs, and utilizing a qualitative map [7] which represents rough 
positional relations of VAs, for example. With respect to the second point, we 
have to analyze behaviors of the system in such a situation and develop more 
sophisticated communication among the VAs. 

The DVS is considered as a growing infrastructure for robots consisting of 
sensors and networks. On the other hand, recent developments of multimedia 
computing environments have established huge number of cameras and comput- 
ers in offices and towns. They, including the DVS, are expected to be integrated 
and to become more intelligent systems — Perceptual Information Infrastructure 
(PII) [2], which provides various information to real world agents such as robots 
and humans. 
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Abstract. In multi-agent reinforcement learning systems, it is impor- 
tant to share a reward among all agents. We focus on the Rationality 
Theorem of Profit Sharing [5] and analyze how to share a reward among 
all profit sharing agents. When an agent gets a direct reward R {R> 0), 
an indirect reward fiR (/i > 0) is given to the other agents. We have 
derived the necessary and sufficient condition to preserve the rationality 
as follows; 

M- 1 

where M and L are the maximum number of conflicting all rules and 
rational rules in the same sensory input, W and Wq are the maximum 
episode length of a direct and an indirect-reward agents, and n is the 
number of agents. This theory is derived by avoiding the least desirable 
situation whose expected reward per an action is zero. Therefore, if we 
use this theorem, we can experience several efficient aspects of reward 
sharing. Through numerical examples, we confirm the effectiveness of 
this theorem. 



1 Introduction 

To achieve cooperation in multi-agent systems is a very desirable goal. In recent 
years, the bottom- up approach to multi- agent systems, which contrast remark- 
ably with the top-down approach in DAI (Distributed Artificial Intelligence), 
has prevailed. Recently, an approach to realize cooperation by reinforcement 
learning is notable. 

There is much literature [12,9,11,8,7,1,2] on multi-agent reinforcement learn- 
ing. Q-learning [10] and Classifier System [4] are used in [12,9,7] and [11,8], 
respectively. Previous works [1,2] compare Profit Sharing [3] with Q-learning in 
the pursuit problem [1] and the cranes control problem [2]. These papers [1,2] 
claim that Profit Sharing is suitable for multi-agent reinforcement learning sys- 
tems. 

In multi-agent environments where there is no negative reward, it is impor- 
tant to share a reward among all agents. Conventional work has used ad hoc 
sharing schemes. Though reward sharing may contribute to improve learning 
speeds and qualities, it is possible to damage system behavior. Especially, it 
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is important to preserve the rationality condition that expected reward per an 
action is larger than zero ( ^ ^)* 

In this paper, we aim to preserve the rationality condition in multi- agent 
environments where there is no negative reward. We focus on the Rationality 
Theorem of Profit Sharing [5] and analyze how to share a reward among all 
profit sharing agents. We show the necessary and sufficient condition to preserve 
the rationality condition in multi- agent profit sharing systems. If we use this 
theorem, we can experience several efficient aspects of reward sharing without 
the least desirable situation where expected reward per an action is zero. 

Section 2 describes the problem, the method and notations. Section 3 presents 
the necessary and sufficient condition to preserve the rationality condition in 
multi- agent profit sharing systems. Section 4 shows numerical examples to un- 
derstand the theorem. Section 5 is conclusion. 

2 The Domain 

2.1 Problem Formulation 

Consider n (n > 0) agents in an unknown environment. At each discrete time 
step, agent i {i = l,2,...,n) is selected from n agents based on the selection 
probabilities Pi {Pi > 0, = 1), and it senses the environment and per- 

forms an action. The agent senses a set of discrete attribute-value pairs and 
performs an action in M discrete varieties. We denote agent Ts sensory inputs 
as Xi^yi^--- and its actions as A sensory input and action pair are 

called a rule. We denote a rule Hf x then a’ as xd. The function that maps 
sensory inputs to actions is called a policy. We call a policy rational if and only 
if expected reward per an action is larger than zero. The policy that maximizes 
the expected reward per an action is called an optimal policy. 

When the n'th agent (0 < n' < n) has a special sensory input on condition 
that {n' — 1) agents have special sensory inputs at some time step, the n'th 
agent gets a direct reward P (P > 0) and the other (n — 1) agents get an indirect 
reward fiR (/U > 0). We call the n'th agent the direct-reward agent and the other 
(n — 1) agents indirect-reward agents. We do not have any information about 
the n' and the special sensory input. Furthermore, nobody (including reward 
designers) knows whether (n — 1) agents except for the n'th agent are important 
or not. A set of n' agents that are necessary for getting a direct reward is called 
the goal- agent set. In order to preserve the rationality condition that expected 
reward per an action is larger than zero, all agents in a goal-agent set must learn 
a rational policy. 

We show an example of direct and indirect rewards in a pursuit problem 
(Fig.l). There are 6 hunter agents (H0,H1,...,H5) and one prey agent (Pry). 
When HO moves down (Fig. la) and 4 hunter agents surround the prey agent as 
shown in figure lb, the direct reward is given to HO and the indirect reward is 
given to the other agents (H1,H2,...,H5). The number of agents that the indirect 
reward is given to is 5 (= 6 — 1) because we have no information about the 
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Fig. 1. The pursuit problem to explain direct and indirect rewards. 



n' (= 4) and the special sensory input. Traditionally, in this case, the direct 
reward is given to H0,H1,H2 and H3. It means that we know what agents are 
important to catch the prey. However, we cannot always have such information 
(for example, H5 may be important). Therefore, our formulation is more general 
than those that have been used previously. 

In general, we can consider another multi-agent environments where several 
agents sense the environemnt and perform their actions asynchronously. In this 
case, there are several problem, for example, how to set reward faction and 
how to decide priority of their actions. Therefore, we have taken more basic 
environments where an agent senses the environment and performs its action 
at each discrete time step. Furthermore, we do not take negative rewards and 
do not use reward values. Though such rewards are given to learning agents in 
most reinforcemnet learning systems, it is difficult to decide appropriate reward 
values and negative rewards. Therefore, we have regarded a reward as a good 
signal only. 



2.2 Profit Sharing in Multi-agent Environments 

The purpose of this paper is to guarantee the rationality of Profit Sharing (PS) [3] 
in the multi- agent environments discussed above. When a reward is given to 
an agent, PS reinforces rules on an episode^ that is a sequence of rules selected 
between rewards, at once. In multi- agent environments, an episode is interpreted 
by each agent. For example, when 3 agents select the rule sequence {x\a{^ ^2^, 
^2^, ^3^, X202^ 2/2^2, ^2^2 and X‘^h^) (Fig. 2) and have special sensory 

inputs (for getting a reward), it contains the episode {x\ai • yiaf)^ {x^a^ • 2/2<^2 • 
X202 • ^2^2 • 2^2 ^2) and • xsbs) for agent 1, 2 and 3, respectively (Fig. 3). 

In this case, agent 3 gets a direct reward and the other agent get an indirect 
reward. 

We call a subsequence of an episode a detour when the sensory input of the 
first selecting rule and the sensory output of the last selecting rule are the same 
though both rules are different. For example, the episode (T^^ • ^2 <^2 • ^2<^2 • 
^2^2 • ^2^2) of agent 2 contains the detour (^2^ • x^a^) (Fig. 3). The rules that 
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reward 



Xi,yi,Zi ; sensory inputs of agent i ai ,bi i actions of agent i 



Fig. 2. A rule sequence when 3 agents select an action in some order. 



agent 1 



ai ai indirect reward 



-an episode- 




agentS 




direct reward 



Xi,yi,Zi; sensory inputs of agent i 



-an episode-l 
ai bi ; actions of agent i 



Fig. 3. Three episodes and one detour in figure 2. 



always exist on a detour do not contribute to get a direct reward. We call a rule 
ineffective if and only if it always exists on a detour. Otherwise, a rule is called 
effective. For example, in the detour y2d2 is an ineffective rule and 

X202 is an effective rule because is not e x ist o n a detour in the first sensory 
input of the episode • ^2^^ • ^2«^ • ^2^2 • ^2^2)- 

We call a function that shares a reward among rules on an episode a reinforce- 
ment function. The term fi denotes a reinforcement value for the rule selected 
at i step before a reward is acquired. The weight Syf of rule Yf is reinforced 
by Syf = + fi for an episode {rwa-i ‘ ' 'Yt • • -Yi -Y q) where Wa is the length 

of an episode called reinforcement interval. When a reward /o is given to the 
agent, we use the following reinforcement function that satisfies the Rationality 
Theorem of PS [5] , 




n = l,2 ,--.,Wa-l. 



( 1 ) 
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where, (/o,VLa) is {R^W) for the direct-reward agent and {Wq < W)) for 

indirect-reward agents. For example, in figure 3, the weight of xi^ and yi^ are 
reinforced by + and Sy:^ = + respectively. 

2.3 Properties of the Target Environments 

The paper [2] claims that two main problems should be considered in multi-agent 
reinforcement learning systems. One is perceptual aliasing problem [12] which is 
due to the agent’s sensory limitation. The other is uncertainty of state transition 
problem which is due to the concurrent learning among the agents [8,1]. We can 
treat these problems by two confusions [6]. 

We call indistinction of state values a type 1 confusion. Figure 4a is an exam- 
ple of the type 1 confusion. In this example, the state value (i;) is the minimum 




O ; sensory input i rule 

Fig. 4. a)An example of the type 1 confusion, 
confusion. 



V ; reward 

b)An example of the type 2 



step to a reward. The value of state la and lb are 2 and 8, respectively. Though 
state la and lb are different states, the agent senses them as the same sensory 
input (state 1). If the agent takes state la and lb equally, the value of state 1 
becomes 5 (=^^)- Therefore the value of state 1 is higher than the value of 
state 4 (it is 6). If the agent uses state values, it would like to move left in 
state 3. However the agent should move right in state 1. It means that the agent 
learns the irrational policy where it only transits between state lb and 3. 

We call indistinction of rational and irrational rules a type 2 confusion. Fig- 
ure 4b is an example of the type 2 confusion. Though the action to move up in 
state la is irrational, it is rational in state lb. Since the agent senses state la 
and lb as the same sensory input (state 1), the action to move up in state 1 is 
regarded as rational. If the agent learns the action to move right in state S, it 
takes the irrational policy that only transits between state la and 2. 

In general, if there is a type 2 confusion in some sensory input, there is a type 1 
confusion in it. By these confusions, we can classify multi- agent environments 
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type 2 confusion 




PS @ PS ® 

QL @ QL X 



Fig. 5. Three classes in multi- agent environments. 



into three classes as shown in figure 5. Markov Decision Processes (MDPs), 
that are treated by many reinforcement learning systems, belong to class 1. Q- 
learning (QL) [10], that guarantees the acquisition of an optimal policy in MDPs, 
is deceived by the type 1 confusion since it uses state values to make a policy. 
PS is not deceived by the confusion since it does not use state values. On the 
other hand, reinforcement learning systems that use the weight (including QL 
and PS) are deceived by the type 2 confusion. 

The Rationality Theorem of PS [5] guarantees the acquisition of a rational 
policy in the class where there is no type 2 confusion. Since we use the theorem, 
we must assume the class. Remark that the class contains a part of two main 
problems {perceptual aliasing and uncertainty of state transition) in multi-agent 
reinforcement learning systems. For example, the environment where positive 
state transition probabilities do not change zero does not have any type 2 con- 
fusion, even if there are perceptual aliasing and uncertainty of state transition 
problems in the environment. Therefore, our target environments are meaning- 
ful as multi- agent environments. In the next section, we extend the Rationality 
Theorem of PS to the multi- agent environments. 



3 Rationality Theorem in Multi-agent Reinforcement 
Learning 

3.1 The Basic Idea 

In this section, we derive the necessary and sufficient condition to preserve the ra- 
tionality condition in the multi-agent profit sharing systems discussed at previous 
section. We call effective rules that will be learned by the Rationality Theorem 
of PS rational rules^ and the others irrational rules. We show the relationship 
of these rules in figure 6. Irrational rules should not be reinforced when they 
conflict with rational rules. When /i > 0, some irrational rule might be judged 
to be an effective rule. Therefore, it is important to suppress all irrational rules 
in effective rules. 
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Fig. 6. The relationship of rational, irrational, effective and ineffective rules. 



In order to preserve the rationality condition, all irrational rules in a goal- 
agent set must be suppressed. On the other hand, if a goal- agent set is con- 
structed by the agents that all irrational rules have been suppressed, we can 
preserve the rationality condition. Therefore, we derive the necessary and suf- 
ficient condition about the range of /jl to suppress all irrational rules in some 
goal- agent set. 

First, we characterize a conflict structure where it is the most difficult to 
suppress irrational rules. For two conflict structures A and B, we say A is more 
difficult than B when the range of /i that can suppress any irrational rule of A is 
included in B. Second, we derive a necessary and sufficient condition about the 
range of /jl to suppress any irrational rule for the most difficult conflict structure. 
Last, it is extended to any conflict structure. 



3.2 Proposal of the Rationality Theorem in Multi-agent 
Reinforcement Learning 

Lemma 1 (The most difficult conflict structure) 

The most difficult conflict structure has only one irrational rule with a self- loop. 
Proof is shown in Appendix A. Figure 7 is the most difficult conflict structure 
where only one irrational rule with a self- loop conflicts with L rational rules. 



r 






V 




O senory input at ?+7 
0 senory input at t 
rational rule 
irrational rule 



Fig. 7. The most difficult conflict structure. 
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Lemma 2 (Suppressing only one irrational rule with a self-loop) 

Only one irrational rule with a self-loop in some goal-agent set can be suppressed 
if and only if 

where M is the maximum number of conflicting rules in the same sensory in- 
put, L is the maximum number of conflicting rational rules, W is the maxi- 
mum episode length of a direct-reward agent. Wo is the reinforcement interval 
of indirect-reward agents and n is the number of agents. Proof is shown in Ap- 
pendix B. 

By using the law of transitivity^ the following theorem is directly derived from 
these lemmas. 



Theorem 1 (Rationality theorem in multi- agent reinforcement learning) 

Any irrational rule in some goal-agent set can be suppressed if and only if 

where M is the maximum number of conflicting rules in the same sensory in- 
put, L is the maximum number of conflicting rational rules, W is the maxi- 
mum episode length of a direct-reward agent. Wo is the reinforcement interval 
of indirect-reward agents, and n is the number of agents. 



3.3 The Meaning of Theorem 1 

Theorem 1 is derived by avoiding the least desirable situation where expected 
reward per an action is zero. Therefore, if we use this theorem, we can experience 
multiple efficient aspects of indirect rewards including improvement of learning 
speeds and qualities. 

We cannot know the number of L in general. However, in practice, we can 
set L = M — 1. 

We cannot know the number of W in general. However, in practice, we can 
set yu = 0 if the length of an episode is larger than W. 

If we set T = M — 1 and Wo = VP, theorem 1 is simplified as follows; 



4 Numerical Example 

4.1 Environments 

Consider roulette- like environments in figure 8. There are 3 and 4 learning agents 
in roulette a) and b), respectively. The initial state of agent i {Ai) is Si. The 
number shown in the center of both roulettes (from 0 to 8 or 11) is given to 
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Fig. 8. Roulette- like environments used in numerical example. 



each agent as a sensory input. There are two actions for each agent; move right 
(20% failure) or move left (50% failure). If an action fails, the agent cannot 
move. There is no situation where another agent gets the same sensory input. 
At each discrete time step, Ai is selected based on the selection probabilities 
Fi (Pi > O^F-LiPi = 1). (Fo,Fi,F 2 ) is (0.9,0.05,0.05) for roulette a), and 
(Fq, Pi, P 2 , Fs, F 4 ) is (0.72,0.04,0.04,0.2) for roulette b). When Ai reaches the 
goal i (Gi), Pi sets 0.0 and Pj (j ^ i) are modified proportionally. 

When Ar reaches Gr on condition that Ai (i % R) have reached Gi, the 
direct reward R (= 100.0) is given to Ar and the indirect rewards jiR are given to 
the other agents. When some agent gets the direct reward or Ai reaches Gj (j % 
i), all agents return to the initial state shown in figure 8. The initial weights for 
all rules are 100.0. 

If all agents learn the policy ^move right in any sensory input\ it is optimal. 
If at least two agents learn the policy ^move left in the initial state^ or ^move 
right in the initial state, and move left in the right side of the initial state\ 
it is irrational. When an optimal policy does not have been destroyed in 100 
episodes, the learning is judged to be succeessful. We will stop the learning if 
agent 0,1 and 2 learn the policy ^move left in the initial state"" or the number of 
actions are larger than 10 thousand. Initially, we set W = 3. If the length of an 
episode is larger than 3, we set /i = 0 (see section 3.3). From equation (4), we 
set < 0.0714... for roulette a) and /i < 0.0333... for roulette b) to preserve the 
rationality condition. 



4.2 Results 

We investigate the learning qualities and speeds in both roulettes. We show them 
in table 1. Table la and lb correspond to roulette a) and b), respectively. The 
learning qualities are evaluated by acquiring times of irrational or optimal poli- 
cies in a thousand different trials where random seeds are changing. The learn- 
ing speeds are evaluated by total action numbers to learn a thousand optimal 
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Table 1. The learning qualities and speeds in roulette a) and b). 





learning qualities 


learning speeds 


irrational 


optimal 


A ve. 


S.D . 


0.0 


0 


1000 


1201.1 


273.0 


lO-*’ 


0 


1000 


1031.2 


119.4 


0.07 


0 


1000 


946.7 


107.0 


0.3 


0 


1000 


900.7 


172.8 


0.4 


1 


999 


910.3 


221.3 


1.0 


4 


939 


1120.0 


794.3 




) 
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learning 

irrational 
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learning speeds 
Ave. S.D. 


0.0 


0 


0 


- 


- 


10-6 


0 


1000 


2690.2 


263.8 


0.03 


0 


1000 


2570.6 


265.2 


0.2 


0 


1000 


2474.9 


402.6 


0.4 


1 


998 


2671.8 


945.2 


1.0 


13 


909 
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Fig. 9. Details of the learning speeds in roulette a) and b) . 



policies. Figure 9a and 9b are details of learning speeds in roulette a) and b), 
respectively. 

Though theorem 1 satisfies the rationality, it does not guarantee the opti- 
mality. However, in both roulettes, the optimal policy always has been learned 
beyond the range of theorem 1 . 

In roulette a), /i = 0.3 makes the learning speed the best (Tab. la. Fig. 9a). 
On the other hand, if we set /jl > 0.4, there is a case that irrational policies have 
been learned. For example, consider the case that A^^Ai and A 2 in roulette a) 
get three rule sequences in figure 10. In this case, if we set /i = 1.0, Mq, Ai and A 2 
approach to G2 ,Gq and Gi, respectively. If we set /jl < 0.0714..., such irrational 
policies do not have been learned. Furthermore, we have improved the learning 
speeds. Though it is possible to improve the learning speeds beyond the range 
of theorem 1 , we should preserve theorem 1 to guarantee the rationality in all 
environments. 

In roulette b), M 3 cannot learn anything because there is no G 3 . Therefore, 
if we set /i = 0, the optimal policy does not have been learned (Tab. lb). In this 
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Fig. 10. An example of rule sequences in roulette a). 



case, we should use the indirect reward. Table lb and figure 9b show that /i = 0.2 
makes the learning speed the best. On the other hand, if we set /i > 0.3, there is 
a case that irrational policies have been learned. It is an important property of 
the indirect reward that the learning qualities exceed those of the case of /i = 0. 

Though theorem 1 only guarantees the rationality, numerical examples show 
that it is possible to improve the learning speeds and qualities. 



5 Conclusions 

In most multi- agent reinforcement learning systems, reinforcement learning 
methods for single- agent systems are used. Though it is important to share 
a reward among all agents in multi- agent reinforcement learning systems, con- 
ventional work has used ad hoc sharing schemes. 

In this paper, we focus on the Rationality Theorem of Profit Sharing and 
analyze how to share a reward among all profit sharing agents. We show the 
necessary and sufficient condition to preserve the rationality condition in multi- 
agent reinforcement learning systems. If we use this theorem, we can experience 
multiple efficient aspects of reward sharing, including improvement of learning 



122 Kazuteru Miyazaki and Shigenobu Kobayashi 



speeds and qualities, without the least desirable situation where expected reward 
per an action is zero. 

Our future projects include : 1) to analyze the improvement effect of learn- 
ing speeds and qualities, 2) to extend to other fields of reinforcement learning, 
and 3) to find efficient real world applications. 



Appendix 

A Proof of Lemma 1 
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Fig. 11. Conflict structures used in proof. 



Reinforcement of an irrational rule makes it difficult to preserve the rational- 
ity condition under any fi. Therefore, the difficulty of a conflict structure is mono- 
tonic to the number of reinforcements for irrational rules. We enumerate conflict 
structures according to the branching factor b (the number of state-transitions 
in the same sensory input), the conflict factor c (the number of conflicting rules 
in it), and the count of reinforcements for irrational rules. Though we set T = 1, 
we can extend to any number easily. 

6 = 1 : It is clearly not difficult since there are no conflicts (Fig. 11a). 

6 = 2: When there are no conflicts (Fig. 11b), it is the same as 6 = 1. We 
divide structures of c = 2 into two subcases. One contains a self-loop (Fig. 11c), 
and the other does not (Fig. lid). In the case of figure 11c, there is a possibility 
that the self-loop rule is selected repeatedly, while the non-self-loop rule is se- 
lected once at maximum. Therefore, if the self-loop rule is irrational, it will be 
reinforced more than the irrational rule of figure lid. 

6 > 3 : When there are no conflicts (Fig. lie), it is the same as 6 = 1. Consider 
the structure of c = 2 (Fig. Ilf). Although the most difficult case is that the 
conflict structure has an irrational rule as a self- loop, even such a structure is 
less difficult than hgure 11c. Considering the structure of c = 3 (Fig.llg), two 
conflict rules are irrational because of L = 1. Therefore, an expected number of 
reinforcements for one irrational rule is less than of figure Ilf. Similarly, conflict 
structures of 6 > 3 are less difficult than figure 11c. 

From the above discussion, it is concluded that the most difficult conflict 
structure is figure 11c. Q.E.D. 
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B Proof of Lemma 2 





Fig. 12. The most difficult multi-agent structure. 



For any reinforcement interval k (/c = 0,l,...,VF — l)in some goal- agent set, 
we show that there is j {j = 1, 2, T) satisfying the following condition, 

(5) 

where is the weight of jth rational rule (rfj) in agent i (Fig. 12). 



Table 2. An example of rule sequence for all agents on some k. If ‘X changes to 
01’ or ‘02 changes to 01’ in this table, the learning in the agent that can select 
the changing rule occurs more easily. 
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the acquisition number of rewards 



First, we consider the ratio of the selection number of to r^Q. When n' = 1 
and L rational rules for each agent are selected by all agents in turn, the minimum 
of the ratio is maximized (Tab. 2). In this case, the following ratio holds (Fig. 13), 

4 : 4o = 1 : (n - 1)L (6) 

Second, we consider weights given to and r^Q. When the agent that gets 
the direct reward senses no similar sensory input in VF, the weight given to rfj 
is minimized. It is j^w-i in /c = W. On the other hand, when agents that get 
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sequence 1 q U 1 



sequence 2 i i 



Fig. 13. An example of rule sequence. Sequence 1 is more easily learned than 
sequence 2. 





the indirect reward sense the same sensory input in VF, the weight given to r^Q 
is maximized. It is — (;^)^°) in W > Wq. 

Therefore, it is necessary for satisfying condition (5) to hold the following 
condition, 

that is, 

It is clearly the sufficient condition. Q.E.D. 
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Abstract. This paper explores how to design good rules for multiple 
learning agents in scheduling problems and investigates what kind of fac- 
tors are required to find good solutions with small computational costs. 
Through intensive simulations of crew task scheduling in a space shut- 
tle/station, the following experimental results have been obtained: (1) 
an integration of (a) a solution improvement factor, (b) an exploita- 
tion factor, and (c) an exploration factor contributes to finding good 
solutions with small computational costs; and (2) the condition part of 
rules, which includes fiags indicating overlapping, constraints, and same 
situation conditions, supports the contribution of the above three factors. 

Keywords: rule design, scheduling problem, multiple learning agents, 
organizational learning, learning classifier system 



1 Introduction 

“How to design good rules ^ in multiagent environments ?” This is one of the 
most important questions to answer in multiagent research, and finding such a 
rule design principle is required to implement multiagent systems. However, it 
is difficult to answer this question because good rules frequently change due to 
many interactions among agents. Aside from the rule design in multiagent sys- 
tems, there are design frameworks or guidelines from other domains. Examples 
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^ By the word good rules, we mean those that improve performance, which is defined 
in this paper as finding good solutions with small computational costs. In the strict 
seance, a good solution depends on the given problem. 
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include frameworks for communication in multiagent environments [2], guide- 
lines for team behaviors [5], and primitive behaviors in multiple robots [12]. 
However, such frameworks or guidelines obviously do not support the design of 
rules in agents, even though these rules are directly connected to a collective 
performance of multiagents. 

Moving the focus into rule design domains, this issue can be categorized as 
follows: (1) appropriate rule generation/acquisition and (2) attribute design in 
rules The former approaches usually determine attributes in rules beforehand 
and concentrates on finding an appropriate combination of attributes from a 
fixed size of search space. The latter approaches, on the other hand, seek the 
kinds/types of attributes needed to improve performance. Although both ap- 
proaches are important for practical and engineering use, the effect of the former 
approaches is supported by the latter approaches. Thus, this paper addresses the 
latter approaches However, little research has focused on the latter approaches 
especially in multiagent environments, because not only attributes in rules but 
also interactions among agents affect collective performance. 

From this background, this paper focuses on the attribute design in multia- 
gent environments, explores the relationship between attributes and properties 
embedded in multiagent interactions, and addresses frameworks or guidelines 
for rule design in multiple learning agents. However, it is quite difficult to pro- 
pose general frameworks or guidelines because this issue depends upon domains. 
Therefore, this paper starts by narrowing the argument down to scheduling prob- 
lems which are one of the major problems and discusses rule design for multiple 
learning agents in this domain. 

This paper is organized as follows. Section 2 starts by briefly explaining 
scheduling problems and designs rules for multiagents. Section 3 describes a 
model for investigating the effect of rule design, and an example of scheduling 
problem is shown in Section 4. Section 5 presents our simulations and discusses 
experimental results. Finally, our conclusions are made in Section 6. 



2 Rule Design in Scheduling Problem 

2.1 Scheduling Problem 

In the context of scheduling problems, much research has studied scheduling 
theory [3] based on operations research or domain-specific heuristic algorithms 
and has also addressed AI approaches such as expert systems or meta-heuristics 
methods [13] such as genetic algorithms [7]. However, it is difficult to normally 
employ these methods in practical and engineering uses. This is because (1) a 
lot of time or high computational costs are needed, (2) it is difficult to cover all 
unexpected situations, and (3) even small modifications affect whole systems. 

^ Attributes in this paper include both condition and action parts of rules. 

^ As another reason for addressing the latter issue, our model (OCS) described later 
can address the former issue but not the latter one. 
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To overcome these problems and find new possibilities in scheduling domains, 
recent research has been done on (a) learning mechanisms, (b) rule-based sys- 
tems with evolutionary approaches, and (c) multiagent approaches. For instance, 
Zhang showed that a reinforcement learning approach found a good feasible 
schedule more quickly than Zweben’s approach [21] based on meta- heuristics 
methods [20]. Since this method could reduce the time for making schedules 
or computation costs by utilizing results acquired through learning, the above 
problem (1) could be solved. As another example, Tamaki showed the gener- 
ality/applicability of production systems with an evolutionary approach in the 
case of environmental changes [19]. This study showed the potential to cover 
unexpected situations in problem (2). Finally, Fujita and lima showed that mul- 
tiagent approaches could find a good schedule in a reasonable time for reschedul- 
ing problems [6,10], and Sen investigated the effect of many search methods for 
scheduling problems in multiagent environments [15]. These studies contributed 
to solving problem (3). 

However, research in these three areas seems to have independently concen- 
trated on improvements in particular methods or techniques, in spite of the fact 
that these components complement each other. Therefore, our previous research 
analyzed what kinds of methods were needed to improve collective performance, 
and we found that an integration of the above three methods (z.e. learning 
mechanisms, rule-based systems with evolutionary approaches, and multiagent 
approaches) showed better performance than any method alone [17,18]. From 
this result, this paper explores good rule design by using our model that inte- 
grates these three methods from multi-strategic standpoints 

2.2 Rule Design 

This section starts by assuming a job as an agent, and we design the z/and then 
parts based on this assumption. 



IF Part: Rules in the if part are designed in two ways as shown in Fig. 1. 
This design is straightforward and simply considers the minimum requirements 
in given problems. In the first design, the if part is composed of the following 
components as shown in Fig. 1 (a). 

— Overlap: To increase the search range for finding good schedules, this design 
allows jobs to overlap with each other in the process of making schedules. 
The condition of overlap determines whether a job is overlapped or not by 
representing 1 or 0. Note that this condition may be omitted if jobs can be 
scheduled without overlapping. 

^ Since this paper investigates properties embedded in multiagent environments to 
address framework/guideline for rule design, a comparison with other conventional 
methods that involve standard theory is out of the scope of this paper. However, 
our previous research [17,18] at least compared results with those of each method 
[i.e. learning mechanisms, rule-based systems with evolutionary approaches, and 
multiagent approaches). 
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Fig. 1. IF Part 

— Constraints: Since the number of constraints depends on the given problem, 
the same number of conditions are provided in the z/part. For example, if the 
number of constraints is n in a given problem, the n conditions are provided 
as shown in Fig. 1 (a). These conditions determine whether each constraint 
is satisfied or not by representing 1 or 0. 

Next, the second design of the z/part is composed of the following components 
as shown in Fig. 1 (b). 

— Overlap and Constraints: These are the same as the components in Fig. 1 
(a). 

— Same Situation: For more precision, this design includes the same situa- 
tion condition as shown in Fig. 1 (b). In detail, this condition determines 
whether a situation on overlapping/ constraints in each job is the same at 
a certain time or not by representing 1 or 0. For instance, if a job overlaps 
with others, some constraints are not satisfied, and this situation does not 
change at a certain time, the flag of “Same Situation” becomes 1. If either 
an overlapping or constraint situation changes, the flag becomes 0. Note that 
this flag becomes 0 even if the situation becomes worse (e.g., a job overlaps 
with others or the number of unsatisfied constraints increases) because the 
situation changes. 

From these designs, the total number of conditions in the if part is n -h 1 
in Fig. 1 (a) and rz + 2 in Fig. 1 (b), where a given problem contains the n 
constraints. 



THEN Part: Actions (Rules in then part) are designed in two ways as shown 
in Fig. 2. 

— Neighbors Search: To find a place that satisfies the overlapping conditions 
and constraints, each job moves its place to the left and right side until the 
new place satisfies the conditions better than the previous place. When we 
imagine that it is enough for jobs to consider only the overlapping condition, 
for instance, the job shown by a dashed square in Fig. 2 (a) moves its place 
to the left and right sides, and the job in this case finds a place without 
overlapping (indicated by a gray box) in 4 time of neighbors search in a 
right side. 
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Fig. 2. THEN Part 



In the particular actions, the 2(n + 1) numbers of actions are designed in 
the then part, and both the xth and x {n l)th actions are in charge of 
satisfying the same xth. condition, where a: = 1 • • • n + 1. These two actions 
are in charge of satisfying the same condition and are categorized as follows. 

• Selfish actions: In the former n + 1 actions called selfish actions, each 
job moves its place if a responsible condition is satisfied more than in 
the previous place. In this case, jobs move their places even if other 
conditions become worse. 

• Altruistic actions: In the latter n + 1 actions called altruistic actions, 
each job moves its place only if a responsible condition is satisfied more 
than in the previous place and does not decrease the number of other 
satisfied conditions. In this case, other conditions are at least kept or 
sometimes improved. 

— Left Search: With the neighbors search, an overlapping area is removed 
and all constraints are satisfied. However, there is no mechanism for finding 
places of jobs that minimize the total scheduling time. Therefore, the final 
action, the left search action, searches from the left side to find the left-most 
place where all conditions do not become worse than in the previous place. 
When we imagine the same case in the example of neighbors search (z.e., 
jobs only consider the overlapping condition), for instance, the right-most 
side job with a dashed square in Fig. 2 (b) moves its place from the left 
side, and the job in this case finds a place without overlapping (indicated by 
a gray box) in 4 times of left search. Note that this action does not search 
places whose location is bigger than a start location in order to avoid a global 
search. 

These two searches in one job are terminated as shown in the above exam- 
ples of Fig. 2, but the moved job may be required to move again due to the 
movement of other jobs. Until all jobs satisfy their overlapping conditions and 
their constraints, these searches are continued. 

The total number of actions in the then part, the total number is 2(n + l) + l, 
where a given problem contains the n constrains. The quantity 2(n + 1) + 1 
indicates the number of neighbors search (selfish actions (n + 1) and altruistic 
actions (n + 1)) + the number of left search (1). 



How to Design Good Rules for Multiple Learning Agents 131 



3 Organizational-Learning Oriented Classifier System 

Our Organizational- learning oriented Classifier System (OCS) [18] has a GBML 
(Genetics-Based Machine Learning) architecture. OCS is composed of many 
Learning Classifier Systems (LCSs) [7,9], which are extended to introduce the 
concepts of organizational learning (OL) ^ studied in organization and man- 
agement science [1,4,11]. LCS is equipped with (1) an environmental adapta- 
tion function via reinforcement learning mechanisms, (2-a) a problem solving 
function via rule-based production systems, and (2-b) rule generation/ exchange 
mechanisms via genetic algorithms, and OCS is (3) multiagent version of LCS. 
Therefore, OCS employs (1) a learning mechanism, (2) rule-based systems with 
evolutionary approaches, and (3) multiagent approaches As mentioned in the 
previous section, these three components contribute to improving the perfor- 
mance when they are integrated. 

3.1 Aim of Agent and Function 

In OCS, agents (jobs in this paper) are implemented by their own LCSs and 
they divide given problems by acquiring their own appropriate functions through 
interaction among agents in order to solve problems that cannot be solved at 
an individual level ^ . Based on this way of problem solving, the aim of the 
agents is defined as finding appropriate functions. These functions are acquired 
through the change in agents’ rule sets (Le., rule base) and the change in the 
strength ^ of rules, and thus a function is defined as a rule set. In particular, a 
rule set drives a certain sequence of actions such as ABCBC • • •, in which the 
A, B and C actions are primitive actions. 

Note that the learning needed to acquire appropriate functions in some agents 
is affected by the function acquisition of other agents. For example, some agents 
are affected when one of the A, B, or C actions of other agents changes or 
when the fired order of the A, B, and C actions of other agents changes through 
learning. 

3.2 Architecture 

As shown in Fig. 3, OCS is composed of many agents, and each agent has the 
same architecture, which includes the following problem solver, memory, and 
mechanisms. In this model, each agent can recognize its own environmental state 
but cannot recognize the state of the total environment. 

^ A detailed introduction to the concepts of OL is discussed in [18]. 

® Note that we use the term “multiagent approaches” because OCS is composed of 
many agents, each of which is designed as one LCS. 

^ In the sense of the division of work, OCS can also be applied not only for multiagent 
problems but also for parallel search problems. In this paper, we consider OCS as 
an architecture for solving multiagent problems because OCS is based on many 
interactions among agents. 

® Strength in this paper is defined as the worth or weight of rules. 
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Fig. 3. OCS Architecture 



< Problem Solver > 

— Detector translates a part of an environment state into the internal state 
of an agent [14]. 

— Effector derives actions based on the internal state [14]. 

< Memory > 

— Organizational knowledge memory stores a set comprising each agent’s 
rule set as organizational knowledge. In OCS, this knowledge represents 
knowledge on the division of work 

— Individual knowledge memory stores a rule set (a set of CFs (classifiers)) 
as individual knowledge. In OCS, agents independently store different CFs 
that are composed of if-then rules with a strength factor. In particular, one 
primitive action is included in the then part. 

— Working memory stores the recognition results of sub-environmental 
states and also stores the internal state of an action of fired rules. 

— Rule sequence memory stores a sequence of fired rules in order to evaluate 
them. This memory is cleared after the evaluation. 



^ Although we showed the effectiveness of organizational knowledge in previous re- 
search [18], this knowledge is not used in this experiment because it has no relation 
to the three components mentioned in section 2.1. 
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< Mechanisms > 

— Roulette selection probabilistically selects one rule from plural rules that 
match a particular environment. In detail, one rule is selected according to 
the size of the strength attached to each rule. Since each rule includes one 
primitive action, one action is performed in each roulette selection. 

— Reinforcement learning, rule generation, rule exchange, and or- 
ganizational knowledge reuse mechanisms are reinterpreted from the 
four kinds of learning in OL (Details, except for the organizational knowledge 
reuse mechanism are described later). 



3.3 Learning in OCS 

Reinforcement learning mechanism: In OCS, the reinforcement learning 
(RL) mechanism enables agents to acquire their own appropriate actions that 
are required to solve given problems. In particular, RL supports to learn the 
appropriate order of the fired rules by changing the strength of the rules. In 
detail, OCS employs a profit sharing method [8], which reinforces a sequence of 
rules when agents obtain some rewards 



Rule generation mechanism: The rule generation mechanism in OCS creates 
new rules when none of the stored rules matches the current environmental state. 
In particular, when the number of rules is MAX.CF (maximum number of rules), 
the rule with the lowest strength is removed and a new rule is generated. In the 
process of rule generation, the condition (if) part of a rule is created to reflect 
the current situation, the action (then) part is determined at random, and the 
strength value of the rule is set to the initial value. Furthermore, the strength of 
the fired rule (e.^., No.z rule) is temporarily decreased as ST{i) = ST{i) — SC{i)^ 
if SC{i) is not 0. In this equation, ST{i) indicates the strength of No.i rule 
and SC{i) indicates the selected number of No.i rule. In particular, SC{i) is 
counted when No.i rule is fired and is reset as 0 when situation changes. By 
this mechanism, the strength of fired rules is decreased as long as the situation 
does not change like in deadlocked situation where the same rules are selected 
repeatedly. Thus, these rules become candidates that may be replaced by new 
rules. However, the strength of these rules are recovered when the situation 
changes. 



Rule exchange mechanism: In OCS, agents exchange rules with other agents 
at a particular time interval (CR0SS0VER_STEP^^) in order to solve given problems 

In previous research [18], we showed the effectiveness of the organizational knowledge 
reuse mechanism that utilizes a set comprising each agent’s rule set before agents 
solve given problems. However, this mechanism is not used in this experiment for 
the same reason of not using organizational knowledge. 

The detailed credit assignment in OCS was proposed in [16]. 

This step is defined in section 4.2. 
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Agent X 



Agent Y 




Fig. 4. Rule exchange mechanism 



that cannot be solved at an individual level. In this mechanism, a particular 
number ((the number of rules) xGENERATION.GAP^^) of rules with low strength 
values are replaced by rules with high strength values between two arbitrary 
agents. For example, when agents X and Y are selected as shown in Fig. 4, 
the CFs in each agent are sorted by order of their strength value (upper CFs 
have high strength values). After sorting, CFj -2 ^ CFj and CF ^_2 ^ CFj^ in 
this case are replaced by CF{ ^ CF^ and CFi ~ CF 3 , respectively. However, 
rules that have higher strength value than a particular value (B0RDER_ST) are 
not replaced to avoid unnecessary crossover operations. The strength value of 
replaced rules are reset to their initial values. This is because effective rules in 
some agents are not always effective for other agents in multiagent environments. 



3.4 Supplemental Setup 

In addition to the above mechanisms, OCS is set up as follows. Initially, a par- 
ticular number (FIRST_CF) of rules in each agent is generated at random, and 
the strength values of all rules are set to the same initial value. 



4 Crew Task Scheduling 

4.1 Problem Description 

The crew task scheduling of a space shuttle/station is a job-shop scheduling 
problem where many jobs for the crews must be scheduled under hard resource 
constraints. The goal of this problem is to find feasible schedules that minimize 
the total scheduling time of all jobs. We selected this domain because (1) this 
problem can be considered as a multiagent problem when one job is assumed 
as one agent and ( 2 ) a systemization of this problem is required to support 
schedulers at ground stations. In this task, there are several missions that are 
composed of jobs, and these jobs should be assigned while satisfying the following 
constraints to accomplish the missions. 
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1. Power of space shuttle/station: Each job requires a particular power 
(from 0% to 100%) in the experiments, but the summation of the power of 
all jobs at each time must not be more than 100%. 

2. Link to the ground station: Some jobs need to use a link to the ground 
station, but only one job can use it at each time. Due to the orbit of the 
spacecraft, none of the jobs can use the link during a certain time. 

3. Machine A: Some jobs need to use a machine A in the experiments, but 
only one job can use it at each time. Examples of such machines involve 
computers, voice recorders, and so on. 

4. Machine B: This condition is the same as that for machine A. 

5. Order of jobs: In a mission unit, jobs have an order (from 1 to the total 
number of jobs where a smaller number means a higher order). Jobs in each 
mission must be scheduled according to their respective orders. 

6. Crew assignment types: The crew is divided into the following two types: 
Mission Specialist (MS) and Payload Specialist (PS). The former is mainly 
in charge of experiments, and the latter supports experiments In a unit of a 
job, one of the following crew assignment types is decided: (a) Anybody, (b) 
PS only (PS is not specified), (c) One specified PS with somebody, (d) One 
specified MS with somebody, and (e) Combination of PS and MS (PS and 
MS are not specified). These types are based on the space shuttle missions. 

In addition to the above six elements, “the length” and “the required number 
of crew members” are decided for each job in advance. 



4.2 Problem Setting 

In this task, each job is designed as an agent in OCS, and each job learns to 
acquire an appropriate sequence of actions that minimizes the total scheduling 
time. Specifically, jobs have 15 primitive actions, such as movements that satisfy 
power constraints. Note that the number of actions (15) is derived from the 
equation of 2(n + 1) + 1 described in section 2.2, where n(= 6) is the constraint 
of the crew task scheduling. Eurthermore, all actions are based on the rule design 
described in the same section. 

In a concrete problem setting, all jobs are initially placed at random without 
considering overlaps or the six constraints described in the previous section. This 
is because neither the jobs nor we know where the best place is for each job to 
minimize the total scheduling time in advance. Due to this random placement, 
a schedule is not feasible at this time. After this initial placement, the jobs start 
to perform some primitive actions in order to reduce overlapping areas or to 
satisfy the constraints while minimizing the total scheduling time. When the 
value of the total time converges with a feasible schedule, all jobs evaluate their 
own sequences of actions according to the value of the total time. Then, the 
jobs restart from the initial placement to acquire more appropriate sequences of 
actions that find shorter times. In this cycle, one step is counted when all jobs 
perform one primitive action, and one iteration is counted when jobs restart 
from the initial placement. 
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4.3 Index of Evaluation 

In this task, the following two indexes are evaluated: 

— Goodness = {total scheduling time) — {minimum scheduling time). 

— Computational cost = step {t) 

The first index {goodness) evaluates a solution to a feasible schedule, and 
the second index {computational cost) calculates the accumulated steps. In the 
former index, the minimum scheduling time is calculated by hand in advance. In 
the latter index, on the other hand, “step (i)” and ^dteration. in. convergence’’^ 
respectively indicate the steps counted in i iterations and the iterations when the 
value of the total scheduling time converges through repetitions that attempt 
to find shorter times from the initial placement. This convergence is recognized 
when the total time shows the same value in particular iterations. 



5 Simulation 

5.1 Experimental Design 

A simulation was used to investigate the effect of the proposed rule design for 
multiple learning agents by evaluating both goodness and computational cost. In 
detail, the following seven cases are tested in five examples that involve from 10 
to 12 jobs. 

- Cases 1, 2, 3 : L, A, S 

- Cases 4, 5, 6 : LA, LS, AS 

- Cases 7 : LAS 

In the above case, L, S, and A respectively indicate Left search action. 
Altruistic action, and Selfish action mentioned in section 2.2. In the case of 
integrating the left search action with other actions (cases 4, 5, and 7), the left 
search action for each job is only executed if its own overlapping area is removed 
and its own constraints are satisfied in each job. This is because this action may 
not reduce an overlapping area or satisfy other constraints due to a limitation in 
the search range (This action cannot search the right size). Furthermore, in the 
case of integrating the altruistic actions with the selfish actions (cases 6 and 7), 
the altruistic actions are executed while the situations of jobs change within a 
certain time, and the selfish actions, on the other hand, are executed when situ- 
ations do not change within a certain time. This is because many of the selfish 
actions obviously do not contribute to reducing the computational cost. Thus, 
the experiments with the cases 1 to 5 employ the type 1 shown in Fig. 1 (a) as 
the z/part, and those with the cases 6 and 7 employ the type 2 shown in Fig. 1 
(b). These types are described in section 2.2. 
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5.2 Experimental Setup 

The variables in our model are designed as follows for the crew task scheduling: 
FIRST_CF (the number of initial rules) is 25, MAX_CF (the maximum number of 
rules) is 50, CR0SS0VER_STEP (the interval steps for crossover operations) is 10, 
GENERATION_GAP (the percentage of operated rules) is 10%, and B0RDER_ST (the 
lowest strength of the rule not for removal) is —50.0 

5.3 Experimental Results 

Fig. 5 shows the goodness (the total scheduling time — the minimum scheduling 
time) and the computational cost (the accumulated steps). The vertical axis show 
these indexes and the horizontal axis shows the seven cases in the experiments. 
In this figure, white and black boxes respectively indicate the result of goodness 
and computational cost, and a large cross indicates the results when a feasible 
solution could not be found. All results are averaged from five different examples 
of schedules In each example, the total number of jobs and the requirements 
in each job (“length,” “necessary power,” and so on ) are different. 




5.4 Discussion 

(1) Goodness and Computational Cost in Three Actions 

— Left search action: From the results as shown in Fig. 5, we find that the 
goodness without the left search action (in the cases of A, S, and AS) is 

Note that (1) all parameters were decided through careful preliminary examinations 
to effectively show the effect of proposed rule design, and (2) the tendency of results 
did not change drastically according to the parameter setting. 

This corresponds to the average of five situations with different random seeds in one 
example. 
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worse than that with others. This is because there is no mechanism in these 
cases for finding the places of jobs that minimize the total scheduling time. 
However, jobs cannot be placed when only the left search action is employed. 
This is because this action cannot search the right side, and this limitation 
makes it hard to reduce an overlapping area or satisfy other constraints in 
a fixed search range. These results suggest that an integration of the left 
search action with other actions improves the goodness, that is, it finds a 
shorter total scheduling time. 

— Altruistic and selfish actions: Fig. 5 indicates that the computational 
costs in the cases of A and LA are small, those in the cases of S and LS are 
large, and that in the case of AS is moderate. These results were obtained be- 
cause of the following reasons: (1) the altruistic actions work as exploitation 
factors that utilize current schedules, and thus this contributes to reducing 
the computational costs; (2) the selfish actions work as exploration factors 
that promote change in the current schedules, and thus this increases the 
computational costs; and (3) the AS integrates exploitation and exploration 
factors, and thus the computational costs are intermediary between the case 
of A, LA and the case of S, LS. 

— Integration of three actions: From the above two analyses, the left search 
action improves the goodness and the altruistic action reduces the computa- 
tional costs. Furthermore, an integration of both actions improves the good- 
ness with small computational costs as shown in the case of LA. However, 
the performance (both goodness and computational costs) of LAS is better 
than that of LA. This is because an introduction of an exploration factor 
(the selfish action) into LA contributes to (1) finding new good solutions 
(i.e., an improvement of the goodness) by getting out of the local minimum 
ones and (2) reducing the computational costs by getting out of deadlocked 
situations like the case that a job goes to and returns to find locations that 
satisfy conditions. However, we must be careful in employing this factor be- 
cause it simply explores the search space. Due to this property, the selfish 
action becomes effective when it is integrated with other actions. Therefore, 
an integration of the three actions improves the performance more than LA 
does. 



(2) Effectiveness of LAS 

From the above analysis, we found that an integration of the left search action, 
the altruistic action, and the selfish action contributes to improving the goodness 
with small computational costs. This indicates that it is important to introduce 
a factor for improving solutions into the situation where both exploitation and 
exploration factors interact with each other in order to improves performance. 
Although March claimed that there is an important balance between exploitation 
and exploration in making an organization grow in the context of organization 
and management science [11], he did not mention how and where this growth 
would occur. As one answer to this question, we claim that it is important to in- 
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troduce another axis that focuses on improvement of solutions into an interaction 
of exploitation and exploration. 

From these discussions, this research has arrived at the following conclusions: 
(1) an integration of the above three factors is required to design good rules in 
multiagent environments where not only attributes in rules but also interactions 
among agents affect collective performance; and (2) the contribution of an inte- 
gration of three factors is supported by the design of a condition part in rules 
that are composed of flags indicating overlapping, constraints, and same situ- 
ation conditions as described in section 2.2. These conclusions imply that it is 
important to consider the design of both the condition (IF) and action (THEN) 
parts and also imply that our design is feasible as a framework or guideline for 
rule design in multiple learning agents. As another advantage of this rule design, 
our design can be applied to not only scheduling problems but also to CSPs 
(constraint satisfaction problems) that search good solutions while satisfying all 
constraints with small computational costs. 

6 Conclusion 

This paper has explored how to design good rules for multiple learning agents in 
scheduling problems and has shown the effectiveness of a proposed rule design 
through an example of crew task scheduling on a space shuttle/station. The 
main results are summarized as follows: (1) an integration of (a) a solution 
improvement factor, (b) an exploitation factor, and (c) an exploration factor 
contributes to finding good solutions with small computational costs; and (2) the 
condition part of rules, which includes flags indicating overlapping, constraints, 
and same situation conditions, supports the contribution of the above three 
factors. Future research will include an analysis of effective components, such 
as the above three factors, and will investigate their integrated effectiveness 
in scheduling domains. A comparison with conventional methods that involve 
scheduling theory must also be made in future work. 
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Abstract. Agent approaches has been increasingly used within infor- 
mation technology to describe various computational entities. Especially, 
due to the proliferation of readily available text databases on the Web, 
agents have been often developed as the computational entities for dis- 
covering useful text databases on the Web. In this paper, we motivate 
the need for the hierarchical organization of those agents. The motiva- 
tion is based on our experiences with the neural net agents for the text 
database discovery and an analysis of the tradeoff between the benefit 
of the hierarchical organization of agents and multi-agent coordination 
overhead. We first introduce the neural net agent and then motivate our 
multi-agent approach based on the hierarchical organization of neural 
net agents both analytically and experimentally. 



1 Introduction 

As the number and diversity of text databases on the Web increases rapidly, 
users are faced with finding the databases that are relevant to the user query. 
Identifying the relevant databases for a given query is the text database discovery 
problem (Gravano, Garcia-Molina, and Tomasic 1994). 

Recently, to solve the text database discovery problem, several statistical 
approaches (Gravano, Garcia-Molina, and Tomasic 1994; Gravano and Garcia- 
Molina 1995; Salton 1971) have been introduced. Although the methods em- 
ployed in these approaches can be used to estimate the number of relevant docu- 
ments in a text database, they are based on very restrictive assumptions regard- 
ing the distribution of terms over the documents in the text database (Meng, 
Liu, Yu, Wang, Chang, and Rishe 1998). Furthermore, the statistical approaches 
not counting user’s feedback are not efficient in practice because a relevant doc- 
ument would be the one that the user issuing the query is interested in and thus 
the relevance of documents can be determined by the user only. 

To reflect the user’s feedback for the text database discovery, learning ap- 
proaches have been also introduced. Savvy Search (Howe and Dreilinger 1997) 
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is the most popular text database discovery system which is able to learn from 
the observations of the users’ search results. However, this system employs a 
simple reinforcement learning scheme which does not consider any correlation 
between query terms for adjusting the associative weights, and thus it often fails 
to discover good text databases, especially for the queries with multiple-terms. 

To relax such inadequacy of the above text database discovery approaches, 
we proposed a neural net agent approach (Choi and Yoo 1998). In this approach, 
an internal neural network mechanism of a neural net agent discovers the text 
databases associated with relevant documents for a given query and then re- 
trieves the relevant documents from those text databases. We also showed how 
well our neural net agent works compared to conventional approaches (Choi and 
Yoo 1999). Our approach, however, suffers from the scalability problem. That 
is, as the number of available text databases increases over a tolerable limit, the 
neural net comes to have difficulty in training its neural network. This problem 
corresponds to the so called “bounded rationality” (Sargent 1993) of single-agent 
approach. 

To overcome this difficulty, in this paper, we propose the hierarchical or- 
ganization of neural net agents where populations of neural net agents learn 
about underlying text databases collaboratively so that it can retrieve the de- 
sired documents effectively from the distributed text databases. The hierarchical 
organization of neural net agents reduces the training costs at an acceptable level 
at the expense of some communication overhead between neural net agents. 

In Section 2, we describe the neural net agent briefly. In Section 3, we mo- 
tivate the multi-agent approach. We first propose the hierarchical organization 
of neural net agents, and then describe the training procedure for collaborative 
document retrieval. We highlight the trade-off between extra communication 
cost and improved training cost of the proposed hierarchical organization. In 
Section 4, we evaluate our hierarchically organized multi- agent approach with 
various performance measurements of an experiment system and compare the 
performance measurements to those obtained by the single-agent approach. 

2 Neural Net Agent 

The text database discovery problem occurs when there are multiple text 
databases and some of them need to be selected for information retrieval (IR) . In 
the environment of our neural net agent, there are multiple text databases and 
each of them receives a query and submits some documents potentially relevant 
to that query based on its own document index. As shown in Figure 1, the neural 
net agent sends a given query to available text databases and then receives the 
documents potentially relevant to that query from them. Figure 1 shows the main 
components of a neural net agent and the control flows among them. Thus, a 
neural net agent a is defined by the 6-tuple a =< QB, IM, RF, TG, LM, QS > . 
Each component of the tuple is described as follows: 

Query Broadcaster {QB) broadcasts a given query to available text 
databases in order to receive all documents potentially relevant to that query 
from them. 




Hierarchical Multi-agent Organization for Text Database Discovery 143 



Information Merger (IM) merges the documents submitted by the text 
databases, checks for duplicates, and then presents the merged results, the union 
of submitted documents, to the issuer of the query. 

Relevance Feedback {RF) receives the user’s judgment for the documents 
presented by the IM for a given query q and then generates a relevance vec- 
tor Cq = , Cq 2 , •••, C^m)- When D is an ordered set of the available text 

databases, D =< di, c/2, ..., dM >, and rriq. is the number of documents which is 
submitted by di and judged to be relevant to a given query q by the user. 
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Fig. 1. Framework of the Neural Net Agent 



Term- vector Generator {TG) transforms a query g, which is expressed as 
a set of index terms by eliminating non-content words and stemming the plural 
noun to its single form and inflexed verb to its original form (Salton and McGill 
1983), into a binary vector representation Sq = (sg^ , 5g2, ..., <Sg^). When T is an 
ordered set of all the index terms, T =< ---Rn >, and q 
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1 if ti e q 
0 otherwise 



for i = 1, 2, ..., N. 
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Learning Mechanism (LM) is used to learn from user’s relevance feedback. 
Each agent has its Learning Mechanism in the form of the neural network asso- 
ciative memory as shown in Figure 1 as the shaded rectangle. BPN is adopted 
for this neural network associative memory to take advantage of its feature ex- 
traction and generalization properties for the text database discovery. Figure 2 
shows that the BPN of neural net agent acts in two phases: a training phase and 
a recall phase. 

During the training phase, the input and output layers of BPN are set to 
represent a training pair {Sq^Cq) where Sq is produced by the TG and Cq is 
produced by the RF for the given query. The well-known BPN learning proce- 
dure (Freeman and Skapura 1992) is performed for all training pairs made of 
the outputs produced by the TG and the RF for the given training queries. The 
BPN learning procedure repeatedly adjusts the link-weight matrices of BPN in 
a way that minimizes the error for each training pair. When the average squared 
error computed over all training pairs is acceptably small, the BPN learning 
procedure stops and produces the link- weight matrices, which is stored as the 
knowledge for the text database discovery. 

During the recall phase, the input layer of BPN is activated by a term- vector 
produced by the TG for a newly given query. This activation of BPN spreads 
from the input layer to the output layer using the link-weight matrices learned 
during the training phase. This spreading activation produces, as the output of 
BPN, a vector representation whose components are all between 0 and 1. 
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Fig. 2. Training and Recall Phase of BPN 
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Query Sender {QS) sends the given query selectively to the text databases 
according to the output of the BPN recall phase. Let D be an ordered set of 
available text databases, D =< di, ^ 2 , • • • , dM >, and let Oq = {oq^ ^Oq^^ - - ^ Oq^^ ) 
be an output vector of the BPN recall phase for the given query q and let r be 
a threshold constant such that 0 < r < 1. Then, the QS sends q to di iff Oq. > r 
for i = 

3 Multi-agent Hierarchy 

In principle, our approach offers the potential of the solution to the text database 
discovery problem. However, for that potential to be fully realized, the neural net 
agent should scale with the increasing number of text databases. For example, 
as the number of available text databases increases over a tolerable limit, the 
neural net agent may come to have difficulty in training its BPN. Actually, for 
the larger number of text databases, the BPN learning mechanism of neural net 
agent should extract more features^ to associate each query with its related text 
databases, and thus the computational task for training BPN increases in size 
and complexity. Tesauro and Janssens (1988) demonstrated that the training 
cost of BPN scales exponentially with the complexity of its computational task. 
Earlier, Minsky and Papert (1969) claimed that many neural networks suffer 
undesirable effect when scaled up to a large size. Therefore, it is not feasible for 
a single neural net agent to learn about too many text databases. Furthermore, if 
a new text database were added into the existing text databases, their neural net 
agent should be re-trained for all the text databases even when it needs to learn 
about only the new one. Thus, when a designer scales up an existing single neural 
net agent into a larger one with more text databases, the redundant training 
cost is quite burdensome in practice. In this section, we propose our multi-agent 
approach to overcome these difficulties. In this approach, the knowledge for the 
text database discovery is distributed over a number of hierarchically organized 
neural net agents and the IR process is collaboratively performed by those agents. 

3.1 Hierarchically Organized Multi-agent IR System 

Suppose that we have a number of neural net agents, each of which has its 
own available text databases as we described in the previous section. We can 
now build a higher-level neural net agent that has the neural net agents as the 
subordinate neural net agents in the same way as the neural net agents have 
their available text databases. Also, using the same principles, we can construct 
the deeper hierarchy of neural net agents. The key point is to notice that the 
higher-level neural net agent can treat the subordinate neural net agents in 
the same way as the subordinate neural net agents treat their available text 
databases. Therefore, the query issuer of the subordinate neural net agents will 
be the higher- level neural net agent. 

^ The feature is defined as a term or a coherent set of terms which distinguishes some 
text databases from other ones. 
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To formally represent such a hierarchy of neural net agents, we first define a 
multi-agent IR system consisting of a number of text databases and the neural 
net agents retrieving documents from those text databases. 

Definition 1. A multi-agent IR system is a 3 -tuple M =< A^D^R> where A 
is a set of neural net agents, D is a set of text databases, R is a binary relation 
on Ax [Avid) sueh that < x,y R iff x is the query issuer of y, and for any 
X G A, there exists some <x,y >G R. 

The multi- agent IR system M =< A, D,R > can be represented as the 
directed graph M' =< AU D,R > where the set A U D is the set of nodes and 
the elements of R are edges. A hierarchically organized multi- agent IR system 
can be similarly defined as follow. 

Definition 2. A multi-agent IR system M =< A,D,R > is hierarchically or- 
ganized if the directed graph M' =< AA D, R > is a tree. 

For example, suppose a multi- agent IR system M =< A,D,R > with A = 
7 ^2? ■ ■ ■ 5 ^ 5 ? D = di, 6^2, • • • , die and R = { < ai, 0,2 >, < ui, 0-3 >, < ui, (24 >, 
< ai,a5 >, ••• < a5,di3 >, < a5,di4 >, < a5,di5 >, < as, die > }• Then the 
directed graph M' =< AUD,R > shown in Figure 3 is a tree where the father of 
each node is the query issuer of that node. Thus M is a hierarchically organized 
multi- agent IR system. 




Q : Neural net agent 



:Text database 



Fig. 3. Hierarchical Organization of Multi- Agent IR System 



In many cases, text databases are categorized according to their document 
topics, where the topics are organized in a hierarchy of increasing 
specificity (Koller and Sahami 1997 ). Therefore, the topology of hierarchically 
organized multi-agent IR system can be determined in advance according to such 
topical hierarchy. This can be accomplished by the simple principle that the text 
databases whose document topics are close to each other in the topical hierarchy 
should be close to each other in the tree structure of hierarchically organized 
multi- agent IR system. 
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Such hierarchical organization of neural net agents divides a text database 
discovery task into a set of smaller subtasks corresponding to the splits in the 
topical hierarchy of the text databases. Each of these subtasks is significantly 
simpler than the original task, since each neural net agent in the hierarchical 
organization needs to distinguish between only a small number of categories. 
Therefore, each neural net agent in the hierarchical organization can associate a 
query with the relevant subordinate agents or underlying text databases based 
only on a small set of features. This ability to restrict to a small feature set 
avoids the overwhelming increase of training cost and also makes the BPN of 
each neural net agent less subject to overfitting (Langley 1988) even for the text 
database discovery in the large set of text databases. As a result, the hierarchical 
organization of multi- agent IR system can utilize the hierarchical topic structure 
to efficiently distribute a text database discovery task over a set of neural net 
agents. The effect of hierarchical organization will be further illustrated in Sec- 
tion 4. 

3.2 Training for Collaborative IR 

In the hierarchically organized multi-agent IR system, the query issuer of each 
neural net agent is the superordinate agent of that neural net agent. ^ There- 
fore, each higher- level neural net agent should provide the relevance feedback as 
well as the query to its subordinate neural net agents in order to train them. 
In principle, the root neural net agent receives from the human user the rele- 
vance feedback on the documents presented for a given query, and thus it can 
propagate the feedback information to the subordinate agents in order to let the 
training procedure be applied to them. The hierarchically organized multi-agent 
IR system is trained as follows: 

Step 1: The root neural net agent broadcasts a given query to the subordi- 
nate agents and this broadcast proceeds top-down to the underlying text 
databases. 

Step 2 : Each underlying text database submits the potentially relevant docu- 
ments to the query issuer and this submission proceeds bottom-up to the 
root neural net agent. 

Step 3 : The root neural net agent presents the union of all submitted docu- 
ments to the human user. 

Step 4: The root neural net agent receives the feedback judgment on the rele- 
vance of documents from the human user. 

Step 5: The root neural net agent propagates the feedback results to their as- 
sociated subordinate agents and this propagation proceeds top-down to the 
bottom- level agents. 

Step 6: Repeat Step I to Step 5 for all queries given by the human user. 

Each subordinate neural net agent is trained on the basis of the queries 
broadcast in Step I and the feedback results propagated in Step 5, while the 

The superordinate agent of root neural net agent is the human user. 
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root agent is trained on the basis of the queries and the feedback judgments 
given directly by the human user. 

After sufficiently trained, each neural net agent in a hierarchically organized 
multi-agent IR system sends a given query to some of its subordinate agents or 
underlying text databases on the basis of its trained BPN and then presents the 
documents submitted by them to the query issuer. As a result, the neural net 
agents in a hierarchically organized multi-agent IR system can collaboratively 
retrieve the relevant documents for a given query without exhaustively traversing 
all underlying text databases. 



3.3 Analysis 

In this section, we analyze the performance of the suggested multi-agent ap- 
proach comparing to the single neural net agent approach in terms of training 
cost and communication cost. We define the training cost and the communication 
cost as follows: 

Definition 3. Training cost is the number of training cycles for the BPN learn- 
ing procedure to finish. 

Definition 4. Communication cost is the total number of query passes to re- 
trieve relevant documents for a given query. 

For the analysis of the training cost, we use the following assumption: 

Assumption 1. The training cost of neural net agent is only determined by the 
number of text databases that the neural net agent learns about. 

Generally, the training cost of BPN is dominated by the complexity of its 
computational task, and the complexity of the text database discovery problem 
is by the number of distinguishing features between text databases which should 
be extracted in order to associate each query with its related text databases. 
Therefore, the above assumption generally holds since the number of distin- 
guishing features between text databases are mainly determined by the number 
of text databases. Thus, the training cost is constant for all the neural net agents 
with the same number of text databases that the neural net agent learns about. 
In the multi-agent IR system, when the number of text databases that each 
neural net agent learns about is maintained under the constant at an acceptable 
level by organizing neural net agents hierarchically, the whole training cost is 
0(n) where n is the number of text databases.^ Especially, in case of a new 
text database is added into the existing hierarchical multi-agent organization, 
the training cost for the scale-up is O(logn) because of the limited effects to 
the hierarchical organization. On the other hand, in the single neural net agent 
approach, the training cost increases radically as the number of text databases 
goes beyond a tolerable level due to the scalability problem. 

^ n corresponds to the number of neural net agents in the tree representing the hier- 
archically organized multi-agent IR system. 
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For the analysis of the communication cost, we use the following assumption: 

Assumption 2. The number of text databases that provide relevant documents 
for a given query is always smaller than some constant value. 

Since documents on the Web are naturally partitioned into many categorized 
text databases, the relevant documents for a given query are usually found in 
one or a few text databases. Therefore, Assumption 2 is practically feasible. 

In the hierarchically organized multi-agent IR system, the communication 
cost is O(logn) where n is the number of text databases.^ On the other hand, 
in the single neural net agent approach, the communication cost might be 0(1) 
provided a single neural net agent could sufficiently learn about all available 
text databases. However, this is an ideal case because the single neural net agent 
approach has trouble with the scalability problem. 

4 Experiments 

We evaluated the performance of our neural net agent approach on the popular 
search directories of Yahoo! Korea. ^ Yahoo! Korea provides hierarchically or- 
ganized directories in Korean language according to various categories, each of 
which serves as a text database that retrieves the documents potentially relevant 
to a given query for its category. Table 1 summarizes the 16 directories selected 
for our experiments. 



Table 1. Summary of the 16 directories of Yahoo! Korea 



Directory 

Category 


Number of 
Documents 


Directory 

Category 


Number of 
Documents 


Physics 


102 


Economics 


129 


Chemistry 


100 


Psychology 


48 


Biology 


314 


Geography 


67 


Astronomy 


91 


Urban Architecture 


89 


Electrical Engineering 


217 


Performing Arts 


165 


Computer Science 


195 


Sports 


206 


Mechanics 


114 


Korean Arts 


105 


Material Science 


56 


Health 


343 



We have constructed an experimental Single Neural net Agent (SNA) that 
would operate on the above 16 directories as the available text databases, and 

^ 0(log n) corresponds to the maximum number of edges from the root to a underlying 
text database in the tree representing the hierarchically organized multi-agent IR 
system. 

^ Yahoo! Korea, http://www.yahoo.co.kr, was used for our experiments due to its 
accessibility with a relatively low communication delay. 
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we provided on-campus access to the SNA via the Web. From Dec. 3 1998 to 
Dec. 28 1998, 133 users issued 797 queries each of which is composed of the 
terms with the AND connective. For each query, the SNA in the training phase 
searched the 16 text databases exhaustively and identified several (or sometimes 
zero) relevant documents from the relevance feedback of the user, where the 
user was requested to examine all the documents presented by the IM. We only 
considered the query for which the user actually examined all the documents 
presented by the IM and at least one relevant document was identified. Thus, 
we collected 734 examples, each of which is given by a query and the documents 
relevant to that query. In the collected examples, the topics of queries covered 
all the 16 directory categories and the relevant documents for each query were 
usually found in only one directory and sometimes in a few directories. Out of 
the 734 collected examples, 657 examples were randomly selected as the training 
examples and the remaining 77 examples were used as the test examples. 

We have also constructed an experimental Hierarchically Organized Multi- 
agent IR System (HOMIRS) that would operate on the same 16 directories 
as the SNA would. To determine the topology of the HOMIRS, we used as 
the topical hierarchy the hierarchy of directory categories that Yahoo! Korea 
provides. Thus, the HOMIRS has become to have 5 neural net agents each of 
which has 4 subordinate agents (or 4 underlying text databases) as shown in 
Figure 3. In this figure, di, ^ 2 , • • • , die represent the selected 16 directories of 
Yahoo! Korea in Table 1 and a 2 , as, a 4 and represent the neural net agents for 
respectively “Natural Science”, “Engineering”, “Social Science” and “Culture” 
categories. 

In our experiments, the size of the BPN input layer was set to the number, 
221, of all index terms that appeared in the 734 queries, the size of the BPN 
output layer was set to the number, 16, of all available text databases for the 
SNA, and also set to the number, 4, of subordinate neural net agents (or under- 
lying text databases) which each neural net agent learns about for the HOMIRS. 
Other BPN parameters are specified in Table 2. 



Table 2. BPN parameters 



Hidden Layer 


Learning Rate 


Bias Weight 


Maximum Of Acceptable 
Average Squared Error 


100 


0.005 


0.2 


0.05 



We trained both the SNA and the HOMIRS^ using the 657 training queries 
and their relevance feedback results from the training examples, and then mea- 
sured their respective performance with respect to the 77 test queries from the 
test examples: In the HOMIRS, these training and test queries were all given to 

® The SNA and the HOMIRS can be accessed at http : //agent . snu. ac .kr/Wa. It also 
provides training and test query lists. 
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the root neural net agent, al in Figure 3. To measure the effectiveness of two 
experimental systems, we defined the precision and the recall ratio as follows: 



the number of relevant documents retrieved 

precision = ^ ^ ^ 

the number of documents retrieved 



the number of relevant documents retrieved 

recall ratio = ^ — — — ^ ^ ^ ^ ^ — 

the number ot all the relevant documents m the test example 

Results obtained by evaluating the entire set of the 77 test queries for various 
threshold values of the QS are shown in Table 3, where the average performance 
values from the 77 test queries are used for each threshold value in terms of 
precision, recall ratio, and communication cost. This table also shows the training 
cost for the SNA and the HOMIRS. For the HOMIRS, the training cost is the 
sum of the training costs of the five neural net agents. 



Table 3. Experimental Results 











threshold constant 


T 






performance measure 


O.I 


0.2 


0.3 


0.4 


0.5 


0.6 


0.7 


0.8 


0.9 


precision 


SNA 


0.72 


0.75 


0.80 


0.80 


0.82 


0.84 


0.84 


0.83 


0.84 




HOMIRS 


0.74 


0.76 


0.78 


0.80 


0.82 


0.83 


0.83 


0.83 


0.85 


recall ratio 


SNA 


0.94 


0.93 


0.92 


0.88 


0.88 


0.88 


0.86 


0.82 


0.79 




HOMIRS 


0.94 


0.92 


0.91 


0.87 


0.87 


0.86 


0.84 


0.83 


0.78 


communication cost 


SNA 


3.24 


2.67 


2.52 


2.33 


2.23 


2.08 


1.94 


1.78 


1.65 




HOMIRS 


5.06 


4.34 


3.96 


3.68 


3.57 


3.37 


3.26 


3.II 


2.85 


training cost 


SNA 

HOMIRS 




931 

810 (=237+204+132+123+114) 





From the table, we can notice that the precision improves but the recall ratio 
decreases as the threshold constant r increases from 0.1 to 0.9 with the interval 
of 0.1. The difference between the SNA and the HOMIRS is not significant. 
We can also notice that the SNA requires the more training cost than each 
neural net agent of the HOMIRS does and even all the agents of the HOMIRS 
do. Thus, we claim that the total training cost may be reduced by organizing 
the neural net agents hierarchically. The training time can be further reduced 
if the neural net agents of the HOMIRS were trained in parallel in different 
platforms. Thus, if the number of the available text databases is quite large, the 
hierarchical organization of neural net agents is expected to be more efficient 
with the drastically reduced training cost. 

The communication cost of the SNA is always smaller than that of the 
HOMIRS. Therefore, the experimental results with the communication cost 
shows that the single-agent approach uses less network resources than the multi- 
agent approach. 
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5 Conclusion 

In this paper, we proposed a hierarchically organized multi-agent approach to 
the text database discovery problem in order to overcome the scalability problem 
which the single neural net agent approach may face. Our multi- agent approach 
provides a scalable method for retrieving the desired documents effectively from 
the distributed text databases. 

In our experimental system, HOMIRS, we identified the trade-off between 
the improvement in the total training cost and the overhead of extra communi- 
cation cost. We expect that the benefit of reduced training cost in the HOMIRS 
outweigh more significantly the communication overhead to maintain the big- 
ger multi-agent organization. Furthermore, in case of neural net agents inhabit 
all together in a single platform, the communication cost between neural net 
agents may be trifling and thus the communication overhead in the multi-agent 
approach may be insignificant. 

Our multi-agent approach also enables scaling up the number of text 
databases without radically incurring additional training cost because of the 
limited effects to the hierarchical agent organization. Therefore, our multi-agent 
approach based on the hierarchical organization of neural net agents makes our 
approach more feasible and practical in the real IR environment where the num- 
ber of available text databases is quite large. 

Currently, we are actively investigating a new mechanism to extend our work 
into an open system. In this system, each neural net agent can dynamically join 
or leave the collaborative organization, and the text databases are subject to 
asynchronous changes of their themes, contents, and structures. 
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Abstract. The incompleteness and uncertainty about the state of the 
world and about the consequences of actions are unavoidable. If we want 
to predict the performance of multiuser computing systems, we have the 
uncertainty of what the users are going to do, and how that affects sys- 
tem performance. Intelligent interface agent development is one way to 
mitigate the uncertainty about user behaviors by predicting what users 
will do based on learned users’ behaviors, preferences, and intentions. 

This work focuses on developing user models that can analyze and pre- 
dict user behavior in multi-agent systems. We have developed a formal 
theory of user behavior prediction based on hidden Markov models. This 
work learns the user model through a time-series action analysis and 
abstraction by taking users’ preferences and intentions into account in 
order to formally define user modeling. 

1 Introduction 

The incompleteness and uncertainty about the state of the world and about the 
consequences of actions are unavoidable. If we want to predict the performance 
of multiuser computing systems, we have the uncertainty of what the users are 
going to do, and how that affects system performance. Intelligent interface agent 
development is one way to mitigate the uncertainty about user behaviors by pre- 
dicting what users will do based on learned users’ behaviors, preferences, and 
intentions. The agents here are for managing resources, specifically, assessing 
the likelihood of upcoming demands by users on limited resources. In our multi- 
agent system, learning interface agents acquire plans of using resources from 
users’ behaviors by recognizing patterns and intentions of users, and predictive 
agents represent the learned plans (patterns) and predict users’ future behav- 
iors regarding resource usage with the use of probabilistic models. Overall, the 
agents in the multi-agent system coordinate together to support an available and 
reliable system by providing timely predictions of the use of system’s resources 
at any time. 

This work particularly focuses on developing user models that can analyze 
and predict user behavior. The user modeling field currently lacks strong founda- 
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tions and formal theories making it difficult to assess the feasibility and applica- 
bility of user models. Much work to date has resulted in ad-hoc approaches such 
as simply capturing user preferences at a shallow level, and have tended to be 
applicable only to highly specialized and narrow domains. We have developed a 
formal theory of user behavior prediction based on hidden Markov models. This 
work learns the user model through a time-series action analysis and abstraction 
by taking users’ preferences and particular intentions into account in order to 
formally define user modeling. This approach is sufficiently general to apply to 
a variety of domains. 

This work considered two general issues: one, whether a system using individ- 
ual user models could be practical and perform better than one using aggregated 
user models, and two, whether a combination of symbolic and probabilistic rea- 
soning could be better than either one individually. Results to date suggest that 
individual user models can be effectively learned and used, and that symbolic 
and probabilistic reasoning can be beneficially combined for this problem. The 
test domain is the prediction of user behaviors in the UNIX domain. 

2 Related Work 

The general area of inferring the goals and intentions of users is commonly known 
as plan recognition [1,2]. The work in plan recognition has focused on inferring 
plans to offer qualified helps, to understand stories, and to detect goal conflicts 
or potential collaborations. In the past, the integration of probabilistic reasoning 
into plan recognition has generally been based on the traditional assumptions of 
plan recognition, such as having the complete plan structure [4], and/or the ob- 
served actions are all purposeful and explainable [3]. Recently, machine learning 
techniques have been employed to acquire plan libraries in an effort to overcome 
these restrictions [5], [6], [7]. Bauer [5] applies decision trees to obtain regu- 

larities of user behavior within a plan hierarchy and uses the Dempster- Shafer 
theory to reason about a user’s actions for assessing plan hypotheses. The work 
by Lesh and Etzioni [6] uses version spaces [8] to represent the relations between 
actions and possible goals and pursues recognizing a goal by pruning inconsis- 
tent actions and goals. Albrecht et al. [7] use a dynamic belief network in order 
to guess a player’s current quest and predict his/her next action and location in 
the “Shattered Worlds” Multi-User Dungeon (MUD) domain. Once a particu- 
lar structure of their Bayesian network is decided, without the notion of a plan 
it uses a brute-force approach to collect statistical data about the coincidental 
occurrence of a player’s action, a location, and a quest. In this work, we de- 
couple plan recognition from probabilities: action analysis is used to learn plans 
(patterns), and probabilistic models are used to represent plans. 

3 The Problem and Approach 

We consider the following prediction problems: what is the likelihood that a user 
will use some system resource in the near future given his or her recent actions. 
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Input: Wg is a sequence of actions in a window size for each observation 

Output: Partial sequences (PS J extracted from observations 
Input Parameters for Prediction System 

(i.e. state transition probability, initial state distribution, output probability) 

Algorithm: 

1. Extract partial sequences by finding correlations 

2. Determine hidden states for each partial sequences 

3. Generate or update parameters 



Fig. 1. Algorithm for Building User Models 



The idea is that actions can be used to recognize when a user is executing a plan 
that uses one of our resources of interest. This is complicated by issues such 
as ambiguity (actions can suggest more than one plan), distraction (users get 
sidetracked and do something superfluous), and interleaved execution (user is 
working on more than one plan at a time). The approach we use is to preprocess 
the observations using general action knowledge and a time-series analysis, and 
use hidden Markov models (that we have learned) for prediction. 

The agent system for the recognition/prediction problem has two major 
parts: building user models and using them to predict the resource usages from 
the users. User-dependent information is collected and used to build individual 
models. A tool for automatic collection of data was developed that would collect 
both the commands given by the user and the responses coming back from the 
system. The user models are then developed from coherent partial sequences 
that are extracted from these collected data. This extraction is based on corre- 
lations of actions, and is further explained below. The learned patterns/plans 
demonstrate user preferences and regularities, that is, how the user behaves in 
using particular resources. 

The algorithm in Figure 1 describes how each user model is built. An experi- 
mental window size is fixed as an input of a sequence of actions Wg . We consider 
only the temporal order of observations rather than their actual times. However, 
having either different dates or long idle times between actions can be a good 
indicator to disconnect a stream of commands for the action reasoning, so we 
plan on including such analysis in the future. 

Extracting partial sequences are done using the knowledge of general actions 
and coherence rules represented in Figure 2. Correlations among actions are de- 
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Data Dependency Rnle (DDR) 



Action Coherence Rnle (ACR) 



Rednndant Action Rnle (RAR) 



IF Path(Ai) and Path(Aj) are eqnal AND 
Arg(Aj) is from Effect (Ai) AND 

Time(Ai) < Time(Aj) 

THEN Link(Ai ^ Aj) as DDR 

IF Path(Ai) and Path(Aj) are eqnal AND 

Arg(Ai) and Arg(Aj) are eqnal or compatible AND 
Time(Ai) < Time(Aj) 

THEN Link(Ai ^ Aj) as ACR 

IF Path(Ai) and Path(Aj) are eqnal AND 
Arg(Ai) and Arg(Aj) are eqnal AND 
Effect (Ai) and Effect (Aj) are eqnal AND 
Time(Ai) < Time(Aj) 

THEN Link(Ai ^ Aj) as RAR 



Fig. 2. General Action Knowledge 



termined by the action knowledge such as command and argument coherence, 
data dependency, anytime action, redundant action, and conditional sequence 
actions. Correlations are used as contextual temporal information to extract co- 
herent partial sequences. The Path for an action represents the current working 
directory of the action issued and the Paths are compared to make sure the 
actions compared for correlations are in the same directory. The Arg represents 
any argument each action might have. The Effect is a result of an action from 
the UNIX system and the Time describes the sequence of two actions. For in- 
stance, if the current action takes an argument from the result of its previous 
action then the data dependency rule is attached as a link and the link represents 
the two actions are correlated with the data dependency relation. 

In a command-driven system like UNIX, a plan of using a particular resource 
is identified by the presence of distinguished actions. For example, “Ipr” in a plan 
indicates that the plan uses a printer. For abstraction, distinguished actions 
in each plan of using a resource, are used as a fixed feature to determine a 
underlying state (resources used) of each subsequence. We define an event for 
each resource of interest: the set of possible plans that use that resource. Suppose 
a coherent partial sequence which is extracted from an observation Wg in a 
training phase is latex- compress-prtex- ftp \ Since prtex is present, this sequence 
is an element of event F^i, the set of “PrinterUse” plans. Since ftp is present, 
it is an element of event the “RouterUse” plans. Since latex is present, 

it is an element of event the “MemoryUse” plans. Therefore, the state of 

this sequence is viewed to be the set of events, that it is in {F^i, F^2, Es} of 
using multiple resources and the sequence is represented as a state of 5 i 23 in 
this case. This definition of state has the useful feature that each element of 
the sample space is in exactly one state, so the states are disjoint. The event 
probabilities relate to the state probabilities in the obvious way: the probability 
of an event is the sum of the probabilities of the states that include that event. 
The parameters generated from these events are inputs to the prediction system 
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of each hidden Markov model, which construct a probabilistic plan structure 
of each user. This work puts an emphasis on producing better parameters for 
the hidden Markov models (HMMs) [9], namely, initial state distribution, state 
transition probability and output probability. They are produced through data 
analysis by filtering and extracting relevant information only and using them to 
disclose hidden states of behaviors from real data, instead of randomly guessing 
numbers of parameters for the HMMs. 

We are interested in estimating the probability of a sequence being in an 
event based on observing part of that sequence. Suppose we are considering three 
events, and observe as input the sequence ‘latex- eompres s-prtex- ftp the predic- 
tions that we wish to make are (at time t\) the probabilities of P{E\[latex')^ 
P{E 2 \‘ latex'), and P{Es\Hatex'); (at time ^ 2 ) P{E\[latex — compress'), 
P{E 2 \‘ latex — compress'), P{Es\‘ latex — compress'), and so on through time ^ 4 . 

Using the definition of conditional probability and Bayes rule, we compute 
the likelihood of the coherent partial sequence to be an exact pattern or a part 
of a pattern of using each resource which is learned from previous observations. 
That is, the problem of interest here is calculating the probabilities of using 
the various resources given a subsequence of commands w. In other words, the 
probability of any state containing the appropriate event given the subsequence, 
that is. 



P{Ej\w) = Ei:EjeSiP{Si\w) for resource j (1) 

and the probability of each state P{Si\w) is 

P(5» = P{S,kw)/P{w) = P{S,) * P{w\S,)/P{w) ( 2 ) 

In order to make such a prediction, the appropriate model needs to be built: 
that is, we need to be able to estimate the above probabilities for any state and 
subsequence. To encode this information, we build a formal model that includes 
local context to improve its reliability [10]. This local context is provided by 
using the correlated subsequences as described above. 

4 The Model 

4.1 Formal Model Definition 

The formal hidden Markov model definition for a reference profile is given below: 

M= {S,U,A,B,a,Si,SF) 



M is the reference profile model for a user. 

S is the set of states in the model. Each of these states has a unique integer 
identifier, and is identified as Si for identifier i. 

E is the distinct observation symbols. 
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A is the state transition probability distribution. A = {aij} where aij = the 
probability of being in state j at time t + 1 if in state i at time t. 

B is the observation symbol probability distribution in state j, B = {bj{k)}, 
where bj{k) = the probability that k is emitted while in state j. 

a is the initial state distribution a = {ai} where ai = the probability of being 
in state i at time 1 (given that in state Sj at time 0). 

Si is an initial state that occurs at the beginning of any state sequence. 

is a final state at the end of any complete state sequence. Neither Sj nor Sf 
can occur anywhere else, they do not emit symbols, and Sf does not lead 
to any other state. Sj^ Sf 0 S. 

In this model, a plan is a sequence of observation symbols emitted by a 
sequence of states visited in traversal from Sj to Sf- We restrict this model by 
having the states correspond to possible subsets of resources used in a sequence. 
Therefore, a sequence can only have one state other than Sj and 5^, so aij = 0 
for i ^ j. The details of this formal theory for our problem can be found in our 
previous paper [13]. 



4.2 Representations of Models 

The correspondence between events and states (above) defines the states for any 
set of resources (as the subsets); it also means that for m resources we need to 
define 2'^ states. We consider two possible models given these states: one where 
the set of observation symbols are all possible coherent sequences, and another 
where the observation symbols correspond to individual commands. 





(b) Simplified Model 



Fig. 3. Possible Markov Models 



The first approach, which we term the ideal models is to have all possible 
coherent sequences as observation symbols and a state for each combination of 
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resources as in (a) of Figure 3. The size of the set U of observation symbols 
for each state in the model is then equal to the possible number of coherent 
sequences, bounded roughly by the number of possible actions raised to the 
window size. In practice, the probability may be nonzero for a fairly small subset 
of U, but that may still be prohibitively large. This means that is may be quite 
unrealistic to try to obtain reliable probability values from real observations. 
Each state of a combination of multiple resources can be represented as a subset 
of the m resources. 

As an alternative (termed the simplified model) ^ we’ve looked at using a single 
command as an observation symbol as in (b) of Figure 3, that is, a sequence 
through the states involves going from Sj to some state 5^, emitting the first 
command, going from Si to 5^, emitting the second command, and so forth, until 
the sequence is finished. In this model, the size of observation symbol set U is 
kept to the number of possible UNIX commands, and so the problem of obtaining 
(and storing) probabilities for observation symbols is mitigated, and traversing 
one step at a time corresponds directly with the incremental predictions (more 
below). However, the model is only an approximation in terms of the meaning of 
the states: there will exist legitimate traversals through a state that do not use 
the appropriate resources, but they may be relatively small enough in practice 
to not greatly affect the predictions. 

Once we have the model and an observation ic, we can calculate the probabil- 
ity of a state using equation (2). To do so, we need to calculate P{Si)^ P{w\Si)^ 
and P{w) from our model. It should be noted that in the ideal model, Oi^i = 0 
for all states z, so all traversals are two transitions and one emission; the partial 
sequences that we observe, however, may be prefixes of the observation sym- 
bols. For both models, P{Si) = P{w) is simply the sum over all states of 
P{Si)^P{w\Si). P{w\Si), however, is calculated differently for the different mod- 
els. For the ideal model, P{w\Si) is the sum of the probabilities of all strings of 
which re is a prefix, that is, 

P{w\Si) = ^ bi{v). 

v:w is a prefix of v 

For the simplified model, P{w\Si) is the product of probabilities of emitting the 
first symbol, going to the same state, emitting the second symbol, and so forth 
to the length of w. For simplicity, define fii = Oi^i (since the rest of the aij^s are 
zero). Let w = CiC 2 ---Cn- Then 



P{w\S,) = bfiCl)PA{C2)P^ • • • bfiCn) 

= 

The value of a (and (3 for the simplified model) are good indicators of general 
user behavior. High probabilities of ai denote high use of the resources corre- 
sponding to the states and o;^ = 0 implies no use of the particular resource 
corresponding to the state. f3 (simplified model) indicates the tendency of a 
user’s toward long or short plans: if the = 0 and ai <> 0, then the plans 
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corresponding to state i are always a single action; larger j3 values correspond 
to a tendency toward longer plans. 

5 An Example 

5.1 Learning Plans 



W: source .login 

A: cd Papers 

> 

B: Is 

> x.tex y 

L: more y 

> 

M: compress y 

C: latex x.tex 

> 

D: prtex x.tex 

> 

N: mail y.Z 




(a) (b) 

Fig. 4. The Correlations of an Example 



Let the current observation Ws be (a) in Figure 4. Through action reasoning 
of conditional sequence (S.C), data dependence (D.D), and argument coherence 
(A.C) rules applied, the sequence can be separated into three coherent subse- 
quences: (W), (A-B-C-D), and (A-B-L-M-N) as (b) in Figure 4 and the sub- 
sequences are abstracted to “Others”, “PrinterMemoryUse” , and “RouterUse” 
plans respectively. 



5.2 Application to Plan Recognition 

We are to compare the simplified model with the ideal model of the problem with 
an example in this section. Suppose the observations of 

coherent partial sequences made to train the model so far are 
cd-latex-prtex, latex-compress-prtex-ftp, cd-latex-f tp-prtex, 
cd-ls-prtex, cd-ls-prtex, cd-ls-latex-prtex, cd-ls-latex, Is-latex, 
cd-ls-latex-ftp-prtex. With the presence of final actions in each partial se- 
quence, the state of the partial sequence 'cd-latex-prtex^ is defined as 5i3 
and the partial sequence 'latex-compress-prtex-ftp^ to be 5i23 and the par- 
tial sequence ' cd-latex-f tp-prtex^ to be 5i23 and so on. Training models of 
both the ideal model and the simplified model are represented in Figure 5. 
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(b) The Simplified Model 



Fig. 5. The Models 



Suppose that the models we have above are sufficient enough to be used with 
training data. Then the question of each partial sequence to be in each resource 
plan will be answered by computing the probabilities in equation 1 in Section 3. 
Let the current observation of a sequence of actions in a test phase be (a) in 
Figure 4. For example, the likelihood of using a resource ^ Printer^ Lprinter 
given a sequence ic= {cd-ls-latex} observed at time tc^ can be computed. 

Lprinter = P Print evU sc' \w) = P{^PrinterU se'\cd-\s-\dXex) 

= P(5i&cd-ls-latex)/P(cd-ls-latex) 

z=l,12,13,123 

= ^ a^P (cd-ls-latex \Si)/P (cd-ls-latex) 

z=l,12,13,123 

Using the example models and the output probabilities in both models in Fig- 
ure 5, Pj (cd-ls-latex) in the ideal model is computed as 

Pj (cd-ls-latex) = 0^3 63 (cd-ls-latex) + 0^13613 (cd-ls-latex) 

+ 0 ^ 123^123 (cd-ls-latex) 

= 2/9 X 1/2 + 2/9 X 1/2 + 3/9 x 1/3 = 1/3 
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while P 5 (cd-ls- latex) in the simplified model is computed with computation of 
each P5(cd-ls-latex|5^) (P5'(cd-ls-latex|5i = 0 since bi{latex) = 0). 

Ps'(cd-ls-latex|53) = j3^'^b^{cd)b^{ls)b^(latex) 

= 3/5 X 3/5 X 2/7 x 1/7 x 2/7 ^ .0042 

P5(cd-ls-latex|5i3) = I3i^'^bi^{cd)bi^{ls)bi2,{latex) 

= 5/7 X 5/7 X 2/9 x 1/9 x 2/9 ^ .0028 



Cs(cd-ls-latex|S'i 23 ) = Pi23‘^h23{cd)bi23{ls)bi23{latex) 

= 10/13 X 10/13 X 2/16 X 1/16 x 3/16 ^ .00087 

Cs(cd-ls-latex) = (^^^^(cd-ls-latexl^i) 

i=3,13,123 

= 7/40 X .0042 + 9/40 X .0028 + 16/40 x .00087 ^ .00172 

The likelihood of using a resource 'Printer^ given a sequence w= 
{cd-ls-latex}, can be computed by applying the obtained numbers from both 
models in the equation for L printer • 

Lprinteri ~ E ai bi (cd-ls-latex) / P (cd-ls-latex) 

z=l,12,13,123 

= (0 + 0 + 1/9 + l/9)/l/3 = 2/3 = .667 



^printers — E (a^P (cd-ls-latex \Si)/P (cd-ls-latex) 

z=l,12,13,123 

= (0 + 0 + 0.00062 + 0.00029)/0.00172 ^ .529 

The probabilities of L printer in each models are marginally different. How 
different is not examined in this paper except comparison of prediction accuracy 
in Section 6. The claim in this work is that the simplified model is an approxi- 
mation which is good enough to be used to compute the sequence of actions by 
taking the ease of one computation at a time. 

Therefore, the conditional probabilities of the partial sequence 
{cd-ls-latex} is likely occur in each plan are computed as below. 

Lpouter = P(‘Po'uter Use' I cd-ls-latex) = P (5^ | cd-ls-latex) 

z=2, 12,23, 123 

Lpouteri — 1/3 = 0.333, ~ 0.248 



pMemory = M emovyU SC \cd-\s-\Q,iex) = P (5^ I cd-ls-latex) 

z=3, 13,23, 123 



LMemoryi ~ 1/3/1/3 = 1, pMemorys ~ 0.0004/0.0004 = 1 
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With the partial sequence cd-ls-latex, the likelihood of the current user using 
resources is predicted Memory^ Printer^ and Router in order. The computation 
in the ideal model takes pre-computing for all possible prefixes of partial se- 
quences. The ease of computations in the simplified model through HMM makes 
predictions of the resource usages be provided at any time with partial sequences. 
The HMM model developed here is generic enough to be used for other general 
problems such as generating sequences of user behavior based on their previ- 
ous behavior represented in Section 4.1, if the state transition probabilities are 
obtained with state transitions of total sequence in the training phase after un- 
covering hidden states. 

6 Experimental Results 

Training models is done with data collected from four different users and the 
number of actions in each reference file varies from 429 actions to 1948 actions 
with various periods of data collected. 
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Fig. 6. Class of Users with a and (3 



The class/type of each user regarding the particular resource use can be 
computed from the probabilistic models in terms of a and f3 as represented in 
the Figure 6 through the user models. User 1 never uses a printer resource within 
the training data, user 2 uses the resources memory, printer and router resource 
in order, user 3 uses router resource most, and user 4 uses any resource least 
among the users. For all of the users, the percentage of using the particular 
resources is rather low. 

The information on user preferences is obtained from the extracted predictive 
patterns of how each user executes a plan. The preferences are demonstrated not 
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User 2 (training : testing = 60 : 40 vs 90 : 10) 
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Fig. 7. Prediction Hit Ratio in Different Methods with Different Models 



only based on the order of actions in a sequence but also from the length of the 
sequence including the repetitive actions. 

The predictions of resource use given the observation of partial sequence PS 
are made at a certain time. The prediction accuracy is tested in different ratio 
of training data sets, that is, 60 to 40 and 90 to 10. Taking a user-2 model as 
an example, prediction hit ratio is measured as in Figure 7 by looking ahead of 
predicted results of testing data only knowing what likelihood of resource use 
has to be predicted but not knowing accurate predictions on the resources. It 
needs to be noted that the criteria of prediction accuracy for this comparison 
is rather generous. Since there is no definite answer or standard for prediction 
accuracies to be correct or incorrect, that is, 99.99% of prediction still can be 
practically wrong, the measurement of prediction accuracy here is not about 
predicting 70% is better than 60% but about predicting particular plans, that 
is, predicting right resource uses. In addition to the comparison of predictions 
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(a) 



(b) 



Fig. 8. Prediction Accuracies of Bigram and Trigram 



between ideal and simplified models, we’ve examined labeling method for learn- 
ing predictive patterns with the same data in order to evaluate the segmentation 
and labeling method by excluding the use of contextual temporal information, 
that is, finding and using correlations of actions. For observed behavior, label- 
ing method in an ideal model demonstrates lowest performance while labeling 
method in a simplified model demonstrates fairly good predictions for this par- 
ticular user. However, for unobserved behavior, labeling method in both models 
predicts everything wrong while detecting irrelevant actions and grouping them 
differently bear out the segmentation and labeling method in both models. 

We also examined a pure statistical approach such as n-grams [12], which 
has been used in general as a method for prediction problem with same data. 
Although the prediction problem looked at in each approach is different, that 
is, the pure statistical approach as in the work [12,14] is to predict the very 
next behavior, while our problem is to predict resource use in upcoming next 
actions, we investigate the pure statistical approach and compare results to have 
a base line of measuring the predictability of our approach. The result of the 
comparison is used to evaluate the advantage of combining the symbolic and 
numerical approaches. 

Statistical approaches tested for the evaluation purpose are first order (hi- 
gram) and second order (trigram) Markov chains. The Figure 8 represents the 
prediction accuracy of statistical approaches: (a) for bigram and (b) for trigram. 
Both 90% and 60% denote the size of training models. The bar graph represents 



Predicting User Actions Using Interface Agents 167 



the prediction of the very next behavior and the line graph demonstrates the 
prediction of resource uses as very next behaviors. 

Except User 1 90% training model predicts better than 60% training model 
and the reason for User 1 could be a different /peculiar set of behaviors at a 
particular period of 10% test set. Bigram models outperform trigram models 
in predicting next behavior in the Figure 8. It is supposedly known that hav- 
ing more information means better than having less information. Since trigram 
models have more information of one more previous action than bigram models 
have as history data, the expectation of performances between two models is de- 
rived that trigram models would outperform bigram models. A major problem 
with the assumption is that of sparse data: observing new trigrams in current 
observations, which never observed in the training models. Taking the charac- 
teristics of Unix domain where both non-strict temporal orderings of actions 
and extraneous actions are common, the reason why bigram models outperform 
trigram models can be explained as not much correlations among actions exist 
when there are frequent extraneous actions in action sequences and taking only 
sequential information into account. User 3 has high prediction accuracy in sta- 
tistical models since the behavior observed is simple and many repetitive actions 
in a short pattern like (from mail) actions. 

Predicting future behavior in using resources to be useful, it should be up- 
coming demands of resource use but focused on short-term behaviors not like 
predicting resource uses after 100 streams of actions observed. We computed the 
average length of learned patterns in each plan of using a resource along with 
the longest length of a pattern to give an edge of short-term prediction with 
bounded lengths of commands as represented in the table 1. 
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Table 1. The Bounded Length of Commands in Predictions 



The bounded length can be used to provide more accurate predictions of 
upcoming resource use demands. 

The results of prediction accuracies in our models are described in Figure 9. 

In Figure 9, bar graphs represent prediction accuracies over total number of 
predictions in both models and line graphs represent prediction accuracies of 
resource uses only. For predicting patterns, all of 90% training models perform 
better than all of 60% training models in our models. While both our and statis- 
tical approaches measure prediction accuracy by looking ahead of the predicted 
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Fig. 9. Prediction Accuracies of Both Ideal and Simplified Models 



results with action sequences taken from test sets, the measurement is different 
for each approach since each approach looks at a different prediction problem. 
The results are based on the same real data gathered from same users. The 
overall results in both Figure 8 and Figure 9 show that the predictions from 
our approach of learning patterns using correlations have higher accuracies than 
pure statistical models and it is particularly distinguished in predicting resource 
usages. 

7 Discussion 

Reactive and intelligent interface agents in a real-world application have been 
developed, solving plan recognition problems using user models. A data filtering 
tool was also developed for automatic on-line data collection, encoding informa- 
tion from observation, capturing both sequential and relational information [11] 
from four individuals. These data are used for off-line analysis and evaluation 
of the predictions. In experimental results, simplified models outperform ideal 
models since it is easier to obtain reliable probabilities in the simplified model 
from the observations and also as there are many variations of a same plan in 
this domain, keeping the strict order of actions in a sequence of observation 
lowers the prediction accuracy. There are also many irrelevant actions in using 
resources within a whole sequence of observation, segmenting the sequences ex- 
cluding those extraneous actions helped to learn the patterns. The results also 
demonstrate how the reality of having real data affects the view of both theory 
and practice as a real factor for an interface agent to decide how to meet the 
real world. When individual differences are more evident in some domains, using 
individual user models can also be beneficial to a user recognition/identification 
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problem. Although the size of data in our experiment is undoubtedly small to 
cover all possible set of observation and to draw the conclusion of the formal mod- 
els proposed, individual differences are demonstrated from the gathered data. In 
summary, our approach achieved that automatic acquisition of particular plans 
from users, that the supportive results of building individual models based on 
individual differences and that the development and verification of the models 
based on the real observation. Additionally, we are building an aggregated model 
to examine actual results with ones from individual models. 
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Abstract. In this paper we propose a system named Gleams of People, 
for monitoring the presence and the status of people. The concept, ar- 
chitecture and implementation issues are described. This system is one 
step towards new network communication tools, with the features of 
non-disturbing, simple and intuitional messaging/signaling, not neces- 
sarily relies on written or spoken languages, and retain the feeling of 
“connected.” 

The system is designed as a multi-agent system where a personal agent 
is assigned to each people. The personal agent can be considered to 
specifically treat social activities and relations of a person. Such agents 
play an important role in communityware (or socialware), which aims 
to support future network communications and communities. Gleams of 
People can be a good sample application for socialware. 



1 Introduction 

With the progress of the information technology, our daily lives are getting “con- 
nected” more and more. Use of e-mail and Web access become already popular: 
For example, applications for jobs via the Internet are getting common. Another 
example of being “connected” is mobile telecommunication devices, such as a 
cellular phone service and a personal handyphone system (PHS). They have 
been spreading widely, especially among young people. Such a trend will be ac- 
celerated by home networking devices, which are currently hot areas of research 
and development. Thus, we are getting more “connected” to other people, com- 
munity and society, in any place, anytime. 

Network communication tools such as e-mail and Web board are the center 
of such connections. Even PHS provides “short mail service” which can be gate- 
wayed to the Internet mail. While these tools are very convenient, they are not 
completely satisfactory. To see what is missing in current network communica- 
tion tools, we introduce two episodes. 

Episode 1. Evidence of aliveness for aged victims after the Great Earthquake 
of Hanshin-Awaji disaster. 

After the Great Earthquake of Hanshin-Awaji disaster in 1995, many vic- 
tims needed to reside in temporary houses. Among them were non- negligible 
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amount of alone aged people, who needed cares occasionally. Because they 
sometimes fallen into difficulties that they cannot call for help themselves, 
social workers needed to visit them often. This situation irritated some aged 
people since it conflicted with their independence^ and since they felt sorry 
about social workers^ effort for them. Then they found a way to solve the 
problem: a sign of aliveness. When a ribbon was on the door, one was OK 
and care was not needed. Since they put a ribbon every morning, if a ribbon 
was not there, one might need some cares. 

Episode 2. Young personas usage of PHS short mail service. 

There was a TV program that young generations discussed how to use (or 
not to use) personal telecommunication devices, especially PHS short mail 
service^. One young person reported that he tended to become eager in 
receiving messages, even though he knew that most of the messages were 
mere idle chats. After noticed his tendency, he had changed his attitude to 
keep some distance between himself and short mail, so that he would not “be 
abused by PHS.” Even after he learned how to use, his main usage of short 
mail was for idle chats. He thought that such messages were still fun, but 
not so important as to calling to, nor as to make face-to-face conversation. 
Furthermore, he found that short mail was suitable in keeping connections 
with friends who had not met for a long time: Sending a short mail just 
saying or ^‘How’s life?^^ was enough and pleasant for both sender and 

receiver. 

These episodes indicate that we need communication tools (or methods) with 
the following features: 

— Non-disturbing, for people in both sides of the communication. 

— Very simple, intuitional messaging (signaling), for example just indicating 
the aliveness as in the episodes. 

— More easy and handy, not necessarily involving written or spoken languages 
(written language is required in e-mail or Web conferencing systems). 

— Retain the users’ feelings of “connected” to other people. 

Such features do not seem well supported in existing network communica- 
tion tools. Therefore as an example for new network communication tools with 
these features, we propose a system called Gleams of People, for monitoring the 
presence and the status of people. 

This paper is organized as follows. In section 2, as a background of the system 
we propose, the notion of socialware -which is an active research area in multi- 
agent field- is explained. In section 3, we discuss the multi- agent architecture for 
the presence monitor in detail. Further we discuss the implementation of Gleams 
of People, and the relationship between the system and “Shine” -a multi- agent 
platform for socialware-. In section 4, we exhibit some related works and mention 
our future work. 

^ A service that very limited (in length and characters) text message can be exchanged 
between each PHS devices. 
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2 Gleams of People as an Application of Socialware 

Because the objective of the system is very personal one, and thus does not fit 
in conventional server-client architecture, Gleams of People adopts multi-agent 
architecture. In the architecture, each personal agent that belong to individual 
person communicates each other. Tasks for each agent are not only limited to 
mediating communication between person. Rather the tasks are considered to 
be social ones, in which an agent needs to treat social activities and relations of 
the user. 

Recently in multi- agent research field, social aspect in agent systems has 
being attracted attention [4,9]. Among them there is a research area called com- 
munity ware (or socialware) [3,5,12], which aims for supporting 

— formation, maintenance and evolution of network communities, and 

— communications exploiting advantages of network communities. 

In case of Gleams of People, exchanging the presence and the status informa- 
tion can be considered as a very primitive sort of communication. Furthermore 
one can consider friends of a person as a “community” from his/her own point 
of view. Thus Gleams of People is an application of socialware (in a broader 
sense), and it demonstrates how an agent treat social activities and relations 
over the network. Another application of socialware can be found in [15,8,13], 
for example. 

As an application platform for socialware, we are developing an multi-agent 
platform named Shine [16]. Shine is intended to integrate common functions 
required by socialware applications, to ease cooperations among applications, 
and to share and reuse program modules in application development. For these 
purposes Shine is being developed as a Java class library. The implementation of 
Gleams of People is on top of the Shine framework, which is described in Sect. 3.3. 

3 A Multi-agent Architecture for the Presence Monitor 

Now we describe a system of presence monitor. Gleams of People, which provides 
presence and status information among users. 

The function is similar to “Who’s online,” which is often provided by online 
community services and applications such as ICQ^ or AOL Instant Messenger 
(AIM). In such “Who’s online” services current status of users are provided, such 
as online/offline, busy/left the seat /idle, and accept /reject. These information 
are typically used to decide whether to start communication or not. That is, users 
are expected to start communication by other methods when using the service. 
On the other hand, we do not assume that communication will be started by 
other methods after using our presence monitor. To provide the presence and 
the status information itself, as a very simple, intuitional message, is the aim of 
our system. 
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There exist some services on the net that provide personal information such as 
whois^ and Bigf oot^. Several portal sites also provide “person finder.” However 
these services are rather static (and often out-dated), and the status information 
for each person is missing in general. 



3.1 The Presence Monitor: Gleams of People 

Intuitively Gleams of People is a system for /sbin/ping^ among people (more 
precisely, among personal agents). Unlike the ordinary ping, a “ping” message 
in Gleams of People carries two additional informations. One is a “color,” which 
represents current “mood” of the sender. The other is a “level,” which indicates 
whether the ping is done by the system automatically or by the user’s explicit 
intention. 

Gleams of People system consists of personal agents for individual users and a 
repeater agent (To be a scalable system, it is possible to have multiple repeater 
agents made into a cluster) . Each personal agent maintains sets of friends of the 
user, mediates the transmission of the presence and the status information in 
place of the user, and displays these status for the user. A repeater agent can 
be considered as a shared “buffered repeater.” It is used by personal agents for 
store and forward information when the destination agent cannot be found or 
is offline. The repeater agent is needed since information while a user is offline 
is still valuable as the presence and the status information of others, if it is not 
too out-dated. Moreover, the fact that the transmission is not always immediate 
may give users some relief that communication in this system is not so disturbing 
one. 

From a user’s point of view. Gleams of People works as follows: 

1. A user invokes his/her personal agent. The initial screen (which is the same 
as the user’s last session) appears. 

2. The user selects his/her “current color” (“mood”) and a friend set to display. 

3. Initialization by the system. 

— Perform level one ping to each members in the current friend set. 

— Contact to the repeater to check if there are pings stored while the agent 
is offline. 

4. According to the responses of the previous step, redisplay the screen. 

5. When a ping from other agent is received, 

— Decide whether to respond or not (according to the rules given), and 
respond if necessary. 

— Find an appropriate friend set and update the database. 

— If the ping was from a member in the current friend set, redisplay the 
user’s screen. 

6. When the user initiates a ping to a member in the current friend set, 

^ RFC 954. Extended whois++ is described in RFC 1835, 1913 and 1914. 

^ http://www.bigfoot.com/ 

ICMP echo request, described in RFC 792. 
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Fig. 1. Screen image of Gleams of People 



— Perform the level two ping to the destination. 

— Redisplay the screen according to the response (if any). 

7. When the user edits the friend set or selects another friend set, update the 
data and redisplay according to the instructions. 

8. Repeat the three steps above (5, 6, 7). 

9. On exit, 

— Perform the level one ping to each members in current friend set. 

— Wait for a while for replies and redisplay, then save the current configu- 
ration and exit. 

Figure 1 is the screen image of Gleams of People. Light circles represent the 
members of current friend set. Each circle gleams when a ping from the friend 
is received, with the color of the ping. This also happen when a response to the 
ping by the user is returned. Furthermore it gleams in a certain interval, which 
is computed from the past records of pings (frequency, direction, time and so 
on). In Fig.l, a circle labeled “Ko-ji” is gleaming in this way. 

A circle gleams stronger for level two ping, weaker for level one ping. While 
the difference between ping level is that whether it is system initiated or user 
initiated, users might find another meaning for ping levels: That is, users can 
naturally interpret the ping as “Fm here” and “Are you there?” for level one 
ping, “He/She is caring me” and “I care you” for level two ping. In Fig.l, a 
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circle labeled “Kazuhiro” indicates level one ping, while “Sen” indicates level 
two ping. 

User can initiate a level two ping to a friend by double-clicking a circle for 
the friend. 

3.2 The Architecture of Gleams of People 

The repeater agent just stores each ping information for a certain period of time 
(specified by the originator). Each ping information is tagged by its originator 
and destination, and only most recent one is stored (i.e., information is over- 
written). When the destination agent contacts, the repeater forwards the stored 
ping information to the destination. 

The personal agent consists of following modules: User interface. Friends 
database. Relation manager. Planner and Communication module. These are 
illustrated in Fig. 2. 

User interface. The appearance of the user interface is given in Fig.l. It is 
designed to give users a sense of “connected” with others. 

For the purpose of “current color” ( “mood” ) selection, a special kind of color 
picker based on Practical Color Coordinate System (PCCS) is developed. 
Unlike 3-dimensional RGB color mixing system or HSV color appearance 
system, PCCS is a 2-dimensional color ordering system (see e.g. [2] for gen- 
eral information on color systems). It’s hue-tone combination^ seems to be 
more convenient in selecting a color that may be binded with a user’s mood. 
Friends database. The friends database consists of sets of friends. Each 
friend set has several members of user’s friends defined by the user. Mul- 
tiple friend sets are provided to cope with user’s context: For example, a 
friend set for business related friends, for sports related friends, for hobby 
related friends. 

Each member in a friend set is expressed by the following data: 

Name: The name used as an identifier of the friend. 

Time, Mood and Level: The time, mood, and level of the most recent 

ping received from the friend. 

Location list: The logical network location of the friend’s agent. If the 

friend has multiple location then the possibility of each location (induced 
statistically) is also recorded. 

Frequency and direction: How frequently pings were exchanged with 

the friend, in which direction. 

The friends database provides basic data necessary for Gleams of People. 
Social aspects such as relationship between friends and the user are induced 
by the relation manager using these data. 

Relation manager. The relation manager maintains relations among each 
friends, and between friends and the user. Here we do not consider complex 
relation among each friends: The relation manager mainly treats relations 
between the user and his/her friends. 

Main functions of the relation manager are: 

® “Vivid - Orange” and “Dull - bluish Green” are examples of tone - hue combination. 
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Fig. 2. Architecture of personal agent in Gleams of People 



— Compute where a particular friend resides, how frequently and when (if 
the friend reside in multiple location). This reflects the behavior pattern 
of the friend. 
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— Find out whether an incoming ping is from a known friend. If not, induce 
who is the closest among known friends^ and decide in which friend set 
the newcomer might be placed, or prompt the user for the selection. 

— Compute how frequently pings were exchanged with a particular friend, 
and in which direction. This reflects the closeness between the user and 
the friends, and the amount of bi-directionality of the relation between 
them. As special cases, the user can instruct the relation manager not 
to respond to a particular friend (i.e., add to a black-list), or to always 
respond with a fixed mood (for example, to a busybody). 

— Compute which pair of friends are correlated statistically. This informa- 
tion has little meaning for Gleams of People itself, but might be helpful 
when other socialware application tries to find the relationships between 
friends, and to find the profiles of friends externally. 

Planner. When an outgoing ping to a particular friend is initiated, the plan- 
ner selects most possible location for the friend as the destination. When 
the timeout occurs it selects the second choice. If all the possible location 
expires, the ping is sent to the repeater as a last resort. This part is naturally 
implemented using a finite state machine model (see for example [7]). 

When an incoming ping is received, the planner decides whether the origi- 
nator is a known friend. If so, it finds in which friend set the one is. Then it 
decides whether the agent should respond and how. This part will naturally 
be implemented as a rule-based system. 

Such selections and decisions are done in cooperation with the relation man- 
ager. 

Communication module. This module is in charge of actual transmission of 
pings. Implementation of the transmission protocol depends on lower mod- 
ules. Possible protocols are: HTTP (tunneling), original protocol on TCP/IP, 
RPC, Java RMI. From the nature of the application, non-persistent type con- 
nection is desired. Currently we use Java RMI because it naturally fits the 
framework we use. In the future it will be possible to select the protocol by 
the instruction from the planner. 

3.3 The Implementation of Gleams of People, in Relation to Shine 

As mentioned in Sect. 2, Gleams of People is implemented on top of the Shine 
framework. It is a simple but interesting application of Shine as well. 

In the Shine framework we have two layers. The Shine layer provides common 
functions required by the agents that treat social activities and relations. On top 
of the Shine layer there is the application layer, which provides domain specific 
functions for each applications. The Shine layer supports each module as follows: 

— Linking the user’s community feeling and the system’s logical information. 

^ This cannot be done using the friends database of Gleams of People alone: Other in- 
formation sources are required for a meaningful induction. It will become possible for 
example, if Gleams of People works in cooperation with other socialware applications 
such as “Community Organizer” [15], that holds feature vectors of friends. 
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— Analyzing features, roles and situations of each people in the context of 
community. 

— Adapting to the dynamic changes in acquaintance relations and group for- 
mations. 

— Providing user models and user data objects. 

Currently we have built an initial prototype of Gleams of People. Implemen- 
tation status of each module and its relation to Shine is described below. 

User interface. The initial implementation is completed. The color picker 
based on PCCS will be incorporated back into Shine, as one of the repre- 
sentation methods of user’s mood. The placement of each circle (friend) is 
currently free to the user. In the future it will be possible to provide another 
placement method from Shine layer to enhance community feeling. Example 
of such a method is a multidimensional scaling method (based on features 
of friends), which is used in “Community Board” [8]. 

Friends database. The friends database makes use of one of the user models 
provided by Shine. The model used in Gleams of People is to describe each 
user by the followings: 

— Basic identity information such as name and locations. 

— Communication log (which corresponds to the frequency, time and the 
direction of ping). 

— Categorization according to some criteria (which corresponds to the 
friend set). 

Note that these data are the basic ones which need less interpretation. In 
Shine framework, more abstract level information that need some interpre- 
tation, such as social relationships, is computed by the relation manager 
(Though in case of Gleams of People, the relation manager need not to com- 
pute much abstract information). 

Relation manager. In Shine framework, the relation manager is responsible 
in computing abstract level information related to social activity, as men- 
tioned above. However, current implementation of the relation manager in 
Gleams of People is very limited. It can only compute some statistical infor- 
mation on the frequency of ping, the location of friends and so on. Induction 
mechanism about in which friend set the newcomer might be placed is not 
implemented yet. 

As described in previous section, the induction cannot be done well with- 
out outer information sources possibly from other social ware applications. 
Furthermore the statistical information available in Gleams of People might 
be useful in such applications as well. Thus, information exchange and inte- 
gration on social ware applications using Shine framework are our important 
future works. 

Planner. In Shine framework, the planner breaks down an abstract action 
which is requested by the relation manager into feasible executions. We have 
implemented a basic state machine which is used in finding appropriate des- 
tinations for outgoing pings. While a rule based system, which will be used 
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in deciding whether the agent should respond to incoming pings, is work in 
progress. 

Communication module. The communication module does not necessarily 
need to know lower protocols, since Shine layer provides them. Currently we 
only have Java RMI as a Shine layer protocol. We are planning to switch 
to use Java Shared Data Toolkit (JSDT) as a Shine layer protocol suite. It 
includes several protocols mentioned in previous section. 



4 Related Works and Future Direction 

In this paper we described the system for the presence and the status moni- 
toring of people. Gleams of People. This system is one step toward widen the 
ability of network communication, in a sense that it provides a new network 
communication tool with the features of non-disturbing, simple and intuitional 
messaging/signaling, not necessarily relies on written or spoken languages, and 
retain the “connected” feeling of users. Systems of such a direction will play 
important roles in future network communications. 

There are several related works in this direction. 

Research and development for “multimedia” applications is related to this 
issue in the sense that they convey nonverbal communication over the network. 
Socia [14] and Free Walk [10] are the good examples along with the line. While 
these applications are powerful in multimodal communication, they are not ul- 
timate solutions: For example, many users of TV conference system complain 
about the lack of liveness and feel they are “separated into both sides.” Further- 
more, these systems require high bandwidth. 

Another type of related work is to extract and make use of physical data 
of the users. An MIT group starts studying Affective computing^ ^ that tries to 
extract emotions and affect signals from biological and physiological data of the 
users, and to make use of these emotions and affect signals in supporting com- 
munications and several other computing. Ishii’s tangible bits research project^ 
also makes use of physical (but not physiological) data as core information to 
be conveyed [6]. Especially inTouch [1] explores new form of interpersonal com- 
munication through touch, via the movement of “shared” object which can be 
touched and moved by the users who are geographically distributed. 

Also related is a direction to develop social conventions among users which 
represents nonverbal information. An old, good example is the “face mark” 
(“smiley”). There have been several face marks developed among netters^*^, such 
as :-) , ;-p , ("") , and , which can be found in e-mail, Usenet news and so 
on. Furthermore especially in Japan, the use of symbol characters such as “^” 

® http : //www. media. mit . edu/af f ect/ 

^ http : //tangible . www . media . mit . edu/groups/tangible/ 
netter n. 

1. Loosely, anyone with a network address. 2. More specifically, a Usenet regular. • • • 
(from The Jargon File, version Jf.A.2) 
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and embedded in text becomes common among young generation to express 
writer’s feeling. 

We have implemented the initial prototype of Gleams of People. It is still in 
development together with Shine, a multi- agent platform for the basis of social- 
ware. Beside the development, further issues to be studied include: 

— What else we can convey as simple, intuitive information? 

Physiological data themselves such as a heartbeat might be one candidate. 

— How the interface should be? 

One design principle behind Gleams of People is the KISS (Keep It Simple and 
Stupid) rule. However, there are many ways to design simple and intuitional 
interfaces. Haptic interface which can be found in embodied agents [11] may 
give some hint. 

— What is the good strategy in evaluating systems like this? How to plan and 
carry out the experiments? 

One intuition motivated to Gleams of People is that “we do not always need 
to talk.” Experimental evaluations based on social science, including testing 
this hypothesis, need to be investigated further. 

We will continue to study these issues, as well as implementing and improving 
Shine, Gleams of People and other socialware applications. 



References 

1. Brave, S., Ishii, S., and Dahley, A.: Tangible Interfaces for Remote Collaboration 

and Communication. In Proceedings of CSCW’98. ACM (1998) pp. 169-178. 179 

2. The Color Science Association of Japan (Ed.): Handbook of Color Science. Uni- 
versity of Tokyo Press (1998) (in Japanese). 175 

3. Hattori, F., Ohguro, T., Yokoo, M., Matsubara, S., and Yoshida, S.: Socialware: 
Multiagent Systems for Supporting Network Communities. Commun. ACM 42,3 
(Mar. 1999) 55-61. 172 

4. Ishida, T. (Ed.): Community Computing - Collaboration over Global Information 
networks -. John Wiley & Sons (1998). 172 

5. Ishida, T. (Ed.): Community Computing and Support Systems. Springer- Verlag 
(LNCS 1519) (1998). 172 

6. Ishii, H., and Ullmer, B.: Tangible Bits: Towards Seamless Interfaces between Peo- 
ple, Bits, and Atoms. In Proceedings of CHI’97. ACM (1997) pp. 234-241. 179 

7. Kuwabara, K.: Meta-Level control of Coordination Protocols. In Proceedings of 
ICMAS’96. IEEE (1996) pp.165-172. 177 

8. Matsubara, S., and Ohguro, T.: CommunityBoard 2: Mediating between speakers 
and an audience in computer network discussions. In Proceedings of Agents ’99. 
ACM (1999) pp. 370-371. 172, 178 

9. Nagao, K., and Takeuchi, A.: Social interaction: Multimodal conversation with 
social agents. In Proceedings of AAAF94. The MIT Press (1994) pp. 22-28. 172 

10. Nakanishi, H., Yoshida, C., Nishimura, T. and Ishida., T.: Free Walk: Supporting 
casual meetings in a network. In Proceedings of CSCW’96. ACM (1996) pp. 308- 
314. 179 



Gleams of People: Monitoring the Presence of People 181 



11. Naya, F., Yamato, J., and Shinozawa, K.: Recognizing Human Touching Behaviors 
using a Haptic Interface for a Pet-Robot. In Proceedings of SMC ’99, IEEE (to 
appear). 180 

12. Nishida, T., Takeda, H., Iwazume, H. Maede, H. and Takaai, M.: The knowledge- 
able community. In Proceedings of Knowledge-based Intelligent Electronic Systems 
(KES’98). IEEE (1998) pp. 23-32. 172 

13. Nishimura, T., Yamaki, H., Komura, T., Itoh, N., Gotoh, T. and Ishida, T.: Com- 
munity Viewer: Visualizing community formation on personal digital assistants. 
In Proceedings of IJCAI’97 Workshop on Social Interaction and Community ware. 
Morgan- Kaufmann (1997) pp. 25-30. 172 

14. Yamaki, H., Kajihara, M., Tanaka, G., Nishimura, T., Ishiguro, H., and Ishida, 
T.: Socia: Non-Committed Meeting Scheduling with Desktop Vision Agents. In 
Proceedings of PAAM’96 The Practical Application Company (1996) pp. 727-742. 
179 

15. Yoshida, S., Kamei, K., Yokoo, M., Ohguro, T., Eunakoshi, K. and Hattori, E.: 
Visualizing Potential Communities: A Multiagent Approach. In Proceedings of 
ICMAS’98. IEEE (1998) pp. 477-478. 172, 177 

16. Yoshida, S., Ohguro, T., Kamei, K., Eunakoshi, K., and Kuwabara, K.: Shine: a 
Cyber-community Application Platform - A Proposal -. appears in PRIM A ’99. 
172 



Distributed Fault Location in Networks Using Learning 

Mobile Agents 



Tony White and Bernard Pagurek 

Systems and Computer Engineering, Carleton University, 

1 125 Colonel By Drive, Ottawa, Ontario, Canada KIS 5B6 
(tony, bemie}@sce. carleton. ca 



Abstract. This paper describes how multiple interacting swarms of adaptive 
mobile agents can be used to locate faults in networks. The paper proposes the 
use of distributed problem solving using learning mobile agents for fault 
finding. The paper uses a recently described architectural description for an 
agent that is biologically inspired and proposes chemical interaction as the 
principal mechanism for inter-swarm communication. Agents have behavior 
that is inspired by the foraging activities of ants, with each agent capable of 
simple actions; global knowledge is not assumed. The creation of chemical 
trails is proposed as the primary mechanism used in distributed problem solving 
arising from the self-organization of swarms of agents. Fault location is 
achieved as a consequence of agents moving through the network, sensing, 
acting upon sensed information, and subsequently modifying the chemical 
environment that they inhabit. Elements of a mobile code framework that is 
being used to support this research, and the mechanisms used for agent mobility 
within the network environment, are described. 



1 Introduction 

The telecommunication networks that are in service today are usually conglomerates 
of heterogeneous, very often incompatible, multi-vendor environments. Management 
of such networks is a nightmare for a network operator who has to deal with the 
proliferation of human-machine interfaces and interoperability problems. Network 
management is operator-intensive with many tasks that need considerable human 
involvement. Legacy network management systems are very strongly rooted in the 
client/server model of distributed systems. This model applies to both IETF [1] and 
OSI [2] standards. In the client/server model, there are many agents providing access 
to network components and considerably fewer managers that communicate with the 
agents using specialized protocols such as SNMP or CMIP. The agents are providers 
(servers) of data to analyzing facilities centered on managers. Very often, a manager 
has to access several agents before any intelligent conclusions can be inferred and 
presented to human operators. The process often involves substantial data 
transmission between manager and agent that can add a considerable strain on the 
throughput of the network. The concept of delegation of authority has been 

H. Nakashima, C. Zhang (Eds.): PRIMA99, ENAI 1733, pp. 182-196, 1999. 
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proposed [3] to address this issue. Delegation techniques require an appropriate 
infrastructure that provides a homogeneous execution environment for delegated 
tasks. One approach to the problem is SNMPscript [4]. However, SNMPscript has 
serious restrictions related to its limited expression as a programming language and to 
the limited area of its applicability (SNMP only). Although delegation is quite a 
general idea, the static nature of management agents still leaves considerable control 
responsibility in the domain of the manager. Legacy network management systems 
tend to be monolithic, making them hard to maintain and requiring substantial 
software and hardware computing resources. Such systems also experience problems 
with the synchronization of their databases and the actual state of the network. 
Although the synchronization problem can (potentially) be reduced in severity by 
increasing the frequency of updates or polling, this can only be achieved with further 
severe consequences on the performance of the system and the network. 

An emerging technology that provides the basis for addressing problems with 
legacy management systems is network computing based on Java. Java can be 
considered a technology rather than merely as another programming language as a 
result of its 'standard' implementation that includes a rich class hierarchy for 
communication in TCP/IP networks and a network management infrastructure. Java 
incorporates facilities to implement innovative management techniques based on 
mobile code [5]. Using this technology and these techniques it is possible to address 
many interoperability issues and work towards plug-and-play networks by applying 
autonomous mobile agents that can take care of many aspects of configuring and 
maintaining networks. For example, code distribution and extensibility techniques 
keep the maintainability of networks and their management facilities under control. 
The data throughput problem can be addressed by delegation of authority from 
managers to mobile agents^ where these agents are able to analyze data locally without 
the need for any transmission to a central manager. We can limit the use of processing 
resources on network components through adaptive, periodic execution of certain 
tasks by visiting agents. The goal is to reduce, and ultimately remove, the need for 
transmission of a large number of alarms from the network to a central network 
manager. In other words, our research focuses on proactive rather than reactive 
management of the network. 

While Java technology provides a device independent agent execution 
environment, the use of mobile code in Network Management and the use of groups of 
agents in particular, generate a number of issues which must be addressed. First, how 
is communication between agents achieved? Second what principles guide the 
migration patterns of agents or groups of agents moving in the network. Finally, how 
are groups of agents organized in order to solve network-related problems? These 
questions motivate the research reported in this paper. 

The remainder of this paper is organized in the following way. First, we briefly 
describe an infrastructure for mobile code that has been designed and implemented in 
Java. A mobile code taxonomy is then presented. The essential principles of Swarm 
Intelligence (SI) and, in particular, how an understanding of the foraging behaviors of 



^ The terms "mobile agent" and "mobile code" will be used interchangeably throughout this 
paper. 
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ants [6] has led to new approaehes to eontrol and management in teleeommunieations 
networks are then reviewed. An agent arehiteeture utilizing mobile eode for the 
loealization of network faults is then provided, along with an example of its use in a 
network seenario. The paper then coneludes with a review of important messages 
provided and a review of planned future aetivities. 



2 Mobile Code Environment (MCE) 

A homogeneous execution environment for mobile code is considered extremely 
advantageous for the agent-based management of heterogeneous networks. Typically, 
an MCE contains the following components [7]: a mobile code daemon, a migration 
facility, an interface to managed resources, a communication facility, and a security 
facility. 




Figure: 1 MCE Components 



It is assumed that a mobile code daemon (MCD) runs within a Java virtual machine 
on each network component (Figure 1). The mobile code daemon receives digitally 
signed mobile agents and performs authentication checks on them before allowing 
them to run on the network component. While resident on the network component, 
mobile agents access managed resources via the virtual managed component (VMC). 
The VMC provides get, set, event and notification facilities with an access control list 
mechanism being used to enforce security. VMCs are designed to contain managed 
information base (MIB) and vendor-related information. A migration facility (MF) 
provides transport from one network component (NC) to another. The mobile code 
manager (MCM) manages the agent lifecycle while present on the NC. For more 
detailed information on the MCE see [7]. 

Mobile code environments are connected with default migration patterns in order to 
form mobile code regions [9] with gateways between them. The migration facility is 
used to move a mobile agent from one network component to another, either within 
the same region or between regions. A single mobile code region will be assumed for 
the remainder of this paper. Individual mobile agents may use the default migration 
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destination or use other algorithms in their ehoiee of migration destination. Migration 
algorithms are presented in the seetion on agent architeeture. 



3 Mobile Agents Types 

The management of networks using delegation and mobile eode has seen the 
development of a taxonomy of agents [8]. Three principal types of mobile agents are 
defined. They are servlets, deglets and netlets. Servlets are extensions or upgrades to 
servers that stay resident as integral parts of those servers. Mobile agents constituting 
servlets are sent from one component to another and are installed as code extensions 
at the destination component; i.e. the agent typically migrates no further. For example, 
a servlet encapsulating the telnet protocol might be sent from one component to 
another in order to facilitate telnet access to the receiving component. Deglets are 
mobile agents that are delegated to perform a specific task and generally migrate 
within a limited region of the network for a short period of time, e.g. to undertake a 
provisioning activity on a network component. Netlets are mobile agents that provide 
predefined functionality on a permanent basis and circulate within the network 
continuously. An example of a netlet might be a component or service discovery agent 
or an agent constituting part of a distributed expert system. This latter example will be 
the subject of a later section. 

In the management of networks using mobile code, the traditional client/server 
interaction represented by an SNMP agent reporting to a single workstation is 
replaced by a set of mobile agents injected by a management workstation that circulate 
throughout the network (typically) reporting only anomalous conditions found. 



4 Swarm Intelligence 

While the MCE enables the transfer of code from one component in the network to 
another and the principle of delegation a reason to use it, it does not provide for 
distributed problem solving by groups or societies of agents. This is the nature of 
Swarm Intelligence. 

Swarm Intelligence [10] is a property of systems of unintelligent agents of limited 
individual capabilities exhibiting collectively intelligent behavior. An agent in this 
definition represents an entity capable of sensing its environment and undertaking 
simple processing of environmental observations in order to perform an action chosen 
from those available to it. These actions include modification of the environment in 
which the agent operates. Intelligent behavior frequently arises through indirect 
communication between the agents, this being the principle of stigmergy [11]. It 
should be stressed, however, that the individual agents have no explicit problem 
solving knowledge and intelligent behavior arises because of the actions of societies 
of agents. 
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Two forms of stigmergy have been described. Sematectonic stigmergy involves a 
change in the physical characteristics of the environment. Ant nest building is an 
example of this form of communication in that an ant observes a structure developing 
and adds to it. The second form of stigmergy is sign-based. Here, something is 
deposited in the environment that makes no direct contribution to the task being 
undertaken but is used to influence subsequent task related behavior. 

Sign-based stigmergy is used in the foraging behavior of ants. The use of ant 
foraging behavior as a metaphor for a problem-solving technique is generally 
attributed to Dorigo [12]. It is considered central to our work. To date, three 
applications of the ant metaphor in the telecommunications domain have been 
documented [13], [14] and [15]. [14] embraces routing in the circuit switched 
networks while [15] deal with packet switched networks. Both [14] and [15] propose 
the control plane as the domain in which their systems would most likely operate. 
[14], in particular, provide compelling experimental evidence as to the utility of ant 
search in network routing. 



5 Service Dependency Modeling 

In order to drive the problem solving process — that of fault finding — a model of 
faults, or a concept of services and dependencies between them, is required. 

Within the context of this paper, a network is said to provide services; e.g., private 
virtual circuits (PVCs). When a service is instantiated; e.g. a new PVC is created, it 
consumes resources in that network and subsequently depends upon the continued 
operation of those resources in order for the service to be viable. From a fault finding 
perspective, a service can then be defined in the following way: 

Soc{{R„p,)} ( 1 ) 

where S is the service, Ri is the i^^ resource used in the service, Pi is the probability 
with which the i^^ resource is used by that service and the relational operator means 
depends upon. A resource Ri might be a node, link or other service. 

For example, a PVC that spans part of a network might depend upon the operation 
of several nodes and T1 links. The links, in turn, might depend upon the correct 



b c 




Figure: 2 An example virtual network 



Distributed Fault Location in Networks Using Learning Mobile Agents 187 



operation of several T3 links that carry them in a multi-layer virtual network. An 
example of such dependencies is shown in the Figure 2. 

Three layers within a multi-layer virtual network are partially represented in the 
figure above. The link ae represents a PVC. This link depends upon links in the layer 
that supports it, in this case the T1 layer represented by links ac and ce. These links, in 
turn, depend upon links in the T3 layer. In the case of link ac, its dependencies include 
links ab and be. The link ce depends upon the T3 links cd and ce for its operational 
definition. An agent-oriented solution to the PVC configuration problem can be found 
in [16], [17], and [18]. 



6 Agent System Architecture 

In the system described here, ant-inspired agents solve problems by moving over the 
nodes and links in a network and interacting with ’’chemical messages” deposited in 
that network. Chemical messages have two attributes, a label and a concentration. 
These messages are stored within VMCs and are the principal medium of 
communication used between both swarms and individual swarm agents. Chemical 
messages are used for communication rather than raw operational measurements from 
the network in order to provide a clean separation of measurement from reasoning. In 
this way, fault finding in a heterogeneous network environment is more easily 
supported. Also, chemical messages drive the migration patterns of agents, the 
messages intended to lead agents to areas of the network which may require attention. 
Chemical labels are digitally encoded, having an associated pattern that uses the 
alphabet {1, 0, #}. This encoding has been inspired by those used in Genetic 
Algorithms [19] and Classifier Systems [20]. The hash symbol in the alphabet allows 
for matching of both one and zero and is, therefore, the ’’don’t care” symbol. 

Agents in our system can be described by the tuple, A= (E, R, C, MDF,m) . This 
definition is described at length in [21] and will only be briefly described here. An 
overview of the research being conducted into the use of Sythetic Ecologies of 
Chemical Agents (SynthECA) can be found in [22]. Agents can be described using 
five components: 

• emitters (E), 

• receptors (R), 

• chemistry (C), 

• a migration decision function (MDF), 

• memory (m) 

An agent’s emitters and receptors are the means by which the local chemical 
message environment is changed and sensed respectively. Both emitters and receptors 
have rules associated with them in order that the agent may reason with information 
sensed from the environment and the local state stored in memory. The chemistry 
associated with an agent defines a set of chemical reactions. These reactions represent 
the way in which sensed messages can be converted to other messages that can, in 
turn, be sensed by other agents within the network. The migration decision function is 
intended to drive mobile agent migration and it is in this function that the foraging ant 
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metaphor, as introduced by Dorigo, is exploited. Migration decision functions have 
the following forms: 

Pi> (t) = F(i,j,k,t) / Nk(i,j,t), R < R* (2) 

= S(iJ,t) 

Nk(i,j,t) = Ej in A(i) F(i,j,k,t) (3) 

F(i,j,k,t) = np[Tykp(t) ]-“'“’[C(i,j)]-P (4) 

F(i,j,k,t) = maxj Op [T.jkp(t) ]-“"’’[C(i,j)]-P, j = j”” (5) 

= 0 

where: 

pij^ (t) is the probability that the k* agent at node i will choose to migrate to 
node j at time t, 

a^p, p are control parameters for the k^^ agent and p^^ chemicals, 

Nk(i,j,t) is a normalization term, 

A(i) is the set of available outgoing links for node i, 

C(i,j) is the cost of the link between nodes i and j, 

Tijkp(t) is the concentration of the p^^ chemical on the link between nodes i and j 
for which the k^^ agent has receptors at time t, 

R is a random number drawn from a uniform distribution (0,1], 

R* is a number in the range (0,1], 

S(i,j,t) is a function that returns 1 for a single value of j, j*, and 0 for all others 
at some time t, where j* is sampled randomly from a uniform 
distribution drawn from A(i), 

F(i,j,k,t) is the migration function for the k^^ agent at time t at node i for 
migration to node j, 

is the link with the highest value of: lip [Tijp(t) ]'“^^[C(i,j)]'^ 

The intention of the migration decision function is to allow an agent to hill climb in 
the direction of increasing concentrations of the chemicals that a particular agent can 
sense, either probabilistically (equation (4) for F(i,j,k,t)) or deterministically (equation 
(5) for F(i,j,k,t)2). However, from time to time, a random migration is allowed, this 
being the purpose of S(i,j,t) . This is necessary, as the network is likely to consist of 
regions of high concentrations of particular chemical messages connected by regions 
of low or even zero, concentrations of the same chemicals. 

Finally, memory is associated with each agent in order that state can be used in the 
decision-making processes employed by the agent. 



^ Pij^ (t) = 1 for and 0 otherwise. 
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6.1 Agent Classes 

The agent classes defined in the system described here are intended to implement an 
active diagnosis system [23]. In active diagnosis systems, monitoring and diagnostic 
activity is undertaken by agents working in a distributed manner in a sensor network. 
The agents perform these activities on a timely basis rather than just when a fault is 
detected. Ishida also describes an immunity-based agent approach to active diagnosis 
that exploits the metaphor of an immune system for active diagnosis. In some sense, a 
fault finding system can be thought of as an immune system and agent classes as 
examples of B-cells and T-cells. In fact, SynthECA agents are characterized by the 
cellular metaphor rather closely as they consist of chemical reactions with a cell 
membrane that consists of effectors and receptors. The internal description of a 
SynthECA agent draws its inspiration from the Chemical Abstract Machine (CHAM) 
[23] and Spreading Activation networks [25]. 

The agent system described here consists of four agent classes. First, condition 
sensor agents (CSAs) are defined. A CSA is an example of a netlet. The function of a 
CSA is to measure one or more parameters associated with a given component and 
determine whether a specific condition is true or false. CSAs interact with VMCs on 
network components by measuring parameters associated with the network 
component; e.g. the utilization of links connected to the node or the utilization of the 
node itself CSAs are adaptive and learn to (a) avoid components where no valid 
sensory information is available and (b) visit components more frequently that are 
likely to cause the condition of interest to evaluate to true. While the first situation 
appears strange at first reading, it must be noted that we are dealing with 
heterogeneous networks where parameters supported by one vendor may not be 
supported or provided by another^ Therefore, it is likely that CSAs will be vendor 
specific or apply to a subset of all components in the network at best. Also, it is 
intended that our CSAs should be self-configuring. Being netlets, they are injected 
into a mobile code region from a network management workstation and are not 
directed to visit particular components. It is essential, therefore, that CSAs are capable 
of learning an applicable (to them) map of the network. A CSA’s ability to modify the 
frequency with which it visits a component facilitates variable frequency polling of 
components. The more the condition for a CSA evaluates to true, the more likely the 
agent is to visit the component. In this way, CSAs spend more of their processing 
effort on components with potential performance problems rather than allotting equal 
time to all components. A CSA may also leave chemical messages on devices that it 
visits. In this way it is possible for two such agents, one for device type one and the 
other for device type two, to measure different parameters but generate the same 
chemical message for use by the fault finding agents. The separation of measurement 
from reasoning is clearly an advantage here. 

It is worth noting that CSAs are capable of interacting with the old manager/agent 
schema for network management. This can easily be implemented using VMCs. For 



^ A review of the private part of an SNMP MIB for a small number of deviees confirms just 
how diverse devices can be. 

^ It is not possible for a CSA to spend all of its time on a single component. 
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example, an application that uses a local VMC and implements an SNMP protocol 
handler can be installed inside the MCD. Thereafter, it can act as an SNMP agent. 

Another possibility that has been implemented within the MCE is a handler of an 
extension protocol. The DPI protocol was chosen for implementation. The DPI 
protocol was chosen as it is a lightweight’ protocol and avoids the BER 
encoding/decoding that is part of SNMP. In this research, a VMC extension registers 
with an SNMP agent and, acting as an SNMP subagent, provides data in response to 
SNMP requests. This scenario is shown in Figure 3. 

Both of these ideas could also be applied in situations where inter-working with a 
legacy system is required. It is possible to associate simulated network components 
with actual devices running legacy agents through properly engineered VMCs. This 
might be the situation where the actual device does not support a Java environment. It 
is also helpful within a research environment to be able to link simulated components 
to the real ones if an idea that has already been tested through a simulation is to be 
tried on a live network. 




Second, service monitoring (SMA) and service change agents (SCA) are defined. A 
service monitoring agent is responsible for monitoring characteristics of a set of 
instances of a service; e.g. the quality of service on one or more PVCs. These agents 
are static and reside where the service is being provided; e.g. at the source of a PVC. 
A service monitoring agent detects changes in the characteristics of the monitored 
service and, if the change is considered significant, a service change agent is sent into 
the network in order to mark the resources on which the service depends with a 
chemical message. The concentration associated with the chemical message reflects 
the change in value of the characteristic of the monitored service. If the change in the 
measured characteristic for the service is considered beneficial, a negative 
concentration will be associated with the chemical message; i.e. the chemical will be 
’evaporated’. If the change in the measured characteristic for the service is considered 
detrimental to the service, a positive concentration will be associated with the 
chemical message; i.e. an existing trail will be reinforced or a new one created. Given 
that resources will be shared by multiple services, it is easy to see that the resources 
common to two services will see twice the change in chemical concentration when the 
SMA detects a significant change. It is this process of chemical interference that 
allows localization of a fault to be inferred. A simple example of chemical 
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interference used for fault localization is shown below in Figure 4. In this example, a 
fault has occurred on node E that has resulted in degraded quality of service for the 
two connections present in the network. The SMAs for the two connections have 
detected the degraded quality of service and sent out service change agents to mark 
the resources (in this case nodes and links) that might be at fault. Figure 4 shows the 
concentration of a chemical message that represents the change in quality of service 
on the network nodes and links. Where a node or link has no associated chemical 
concentration, it means that it is zero. Figure 4 clearly shows that the highest 
concentration of the chemical is to be found at node E. 

Problem identification agents - 
other netlets that circulate 
continuously throughout the 
network — use the trail of chemical 
messages laid down in the network 
in order to determine the location 
of faults and to initiate diagnostic 
activity. These agents form the 
final class of agents defined. The 
value of communicating problems 
to network operators rather than a 
stream of alarms has long been 
understood [26, 27, 28]. In this 
previous work, a static knowledge 
base system has been developed 
where the knowledge base is composed of a set of problem classes with 
communication by messaging between them. A problem class represents a model of 
one or more potential faults in the network. Instances of problem classes are intended 
as hypotheses regarding a fault in the network and a winner-take-all algorithm, where 
the instance explaining the most alarms is considered the most likely problem, is used 
to discriminate between competing hypotheses. 

Mapping a single problem class to a problem agent, and using inter-agent 
communication for inter-problem message passing, seems a natural progression of this 
work. Rather than being alarm driven as reported in previous research, problem agents 
respond to the chemical messages laid down in the network and migrate from 
component to component based upon the concentrations associated with these 
chemical messages. 

6.2 Problem Solving by Agents 

Several problem agents have been implemented. First, a PVC Quality Of Service 
problem agent (qos-agent) has been built. This agent hill climbs in the space of the 
chemical laid down by SCAs. At the beginning of our research, these agents would 
initiate diagnostic activity on a component when a concentration threshold was 
reached and this threshold implied that at least two SCAs have visited the component. 
This, however, has the potential for large numbers of incorrect diagnoses. 
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A much-improved solution to the problem is the introduetion of reinforcement 
learning teehniques to the agent architecture. A reinforcement learner is introduced at 
eaeh node in the network and implemented as part of the VMC. The state associated 
with the reinforcement learner is the veetor of eoneentrations of q-chemical on the 
node and conneeting links. For example, in Figure 4, the veetor (2, 1,0, 0,1, 1,1) could 
be used to define the state of the node E and its network links. The actions available in 
a given state are to diagnose a component (node or link) or not do anything. 
Diagnostic actions are also stored within the VMC. Action selection is based upon the 
Q value assoeiated with the state. The reinforcement signal within the system is 
provided by the SCAs. If the qos-agent selects the eorrect component for diagnosis^ 
the SMAs will detect the ehange and send SCAs into the network in order to modify 
the eoneentrations of q-chemical on the various nodes and links that form part of the 
circuit. This change will, in turn, be sensed by the qos-agent that will inerease the 
value associated with taking that aetion in that state. If an incorrect component is 
ehosen for diagnosis, two situations are possible. Firstly, if we assume that diagnostie 
aetions cannot make the quality of serviee of the eonnection degrade further, then 
changes of that kind that the qos-agent sees are not as a result of its aetions. It does not 
use these signals to update the value assoeiated with ehoosing that diagnostie action. 
They are assumed to be the result of a fault elsewhere in the network^. Seeondly, it is 
possible that no improvement in quality of service is seen by the SMAs whose circuits 
depend upon the component being diagnosed. In this situation, the qos-agent "times 
out" and applies a negative signal to the action associated with the initial state. It then 
attempts (up to) two further diagnoses before migrating to a new node. Should one of 
the remaining diagnoses improve the quality of service for the eircuits depending upon 
the eomponent diagnosed, the feedback is applied in a discounted fashion to the one 
or two diagnoses that preeeded it. This apportionment of the reinforeement signal is 
done to take aeeount of lateney effeets in the network. 

As stated above, diagnostic actions are initiated by interaetion with the component 
through a VMC. When sueh aetivity is initiated, and the diagnostie activity is 
successful as measured by improved quality of service, the eoneentration of the 
’chronic-failure' chemical, or c-chemical, is increased on the eomponent. The amount 
of e-chemieal deposited on the device is proportional to the time taken to receive the 
positive reinforeement signal. 

A Chronic Failure problem agent has been defined in the system that senses the e- 
chemical for the purpose of identifying components that experience multiple faults in 
short periods of time. The eoneentration of c-chemical is used within the migration 
deeision funetion of explorer agents^ to determine where new conneetions should be 
made. In order that c-ehemical eoneentrations do not inerease unchecked, a CSA has 
been included in the system that periodieally visits components and 'evaporates' c- 



^ We assume that the fault correction activity, if initiated on the correct component, will be 
successful. Diagnosis is not the focus of this paper, fault location is. 

^ We do not assume single faults in our system; several may be present in the network at the 
same time. 

^ Explorer agents are described at length in [21]. 
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chemical concentrations. For details on the role of ehemicals and agent ehemistry in 
SynthECA, the reader should eonsult [21]. 

Finally, an Overload problem agent has been defined. This agent hill elimbs in the 
spaee of the coneentration of a ehemical generated by CSAs that eireulate in the 
network, monitoring eomponent and link utilization parameters. Again threshold 
driven, it is intended that persistently over-utilized components are identified in order 
to faeilitate re-planning of the network. Results of that work are not presented here. 



7 Results 



The routing and fault location 
system deseribed briefly in the 
previous seetions was applied to a 
number of small networks, one of 
whieh is shown in Figure 5. Eaeh 
component in the network was 
assumed to have a probability of 
causing degraded performance of 
0.1 and 5 distinct quality of service 
degradation levels were defined. The experimental setup and nature of traffie patterns 
that were applied to this network are defined in [17]. Quality of service ehanges were 
randomly injected into the network in order to test the response of the system. A 
reinforeement learner was initialized on eaeh node sueh the most likely action chosen 
for any state was the diagnosis of the eomponent assoeiated with the highest individual 
component of that vector (assuming > 2). 




Figure 5: Experimental Graph I 



Rtness vs Generations 



Rtness vs Generations 




Figure 6: Learning Results 
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The number of qos-agents was varied from 1 to 5. The reason for this is that qos- 
agents aeting independently ean eause ineorrect feedbaek to be seen by one another 
and thereby degrade learning performanee. This is the so-ealled Tragedy of the 
Commons problem often observed in multi agent learning systems [29]. While a single 
qos-agent would eventually visit and diagnose the correct component, this would lead 
to unacceptable fault location times in large networks. However, having too many 
agents causes inferior learning performance owing to the poor nature of the 
reinforcement signals. Increasing the number of qos-agents increases the probability 
that the successful diagnosis by one agent will be seen as a positive reinforcement 
signal by another. Wolpert et al. [29] provide a useful analysis of the properties of a 
MAS with reinforcement learning that overcomes these problems. Examples of 
learning performance for two typical runs are shown in Figure 6 for two qos-agents in 
the system. The curves shown represent the trend in performance, not the raw 
experimental data. 

Several experiments were 
performed with varying numbers 
of qos-agents in the system. For 
the size of network shown in 
Figure 5, two agents were found 
optimal in the sense that the 
converged performance of the 
reinforcement learners was 
superior to that of all other qos- 
agent configurations. The 
variation of converged 
performance with the number of 
Figure 7: QOS Agent number variation qos-agents is shown is Figure 7. 
The difference in converged performance between one and two agents is small but is 
slightly superior. In addition, the time to diagnose the location of a fault is lower. 



Converged Performance vs 
Agent Number 




1 2 3 4 5 

Number of Agents 



8 Conclusions 

This paper has presented a multi-agent system that relies on Swarm Intelligence and, 
in particular, trail laying behavior in order to locate faults in a communications 
network. This architecture promotes the idea of a clear separation of sensing and 
reasoning amongst the classes of agents used and promotes the idea of active, or 
collective, diagnosis. A chemically inspired messaging system augmented with an 
exploitation of the ant foraging metaphor have been proposed in order to drive the 
mobile agent migration process. The paper has demonstrated how fault location 
determination can arise as a result of the trail-laying behavior of simple problem 
agents. An implementation of this architecture has demonstrated that mobile agents 
can be effectively used to find faults in a network context. The service dependency 
model concept, along with the introduction of reinforcement learning techniques for 
the learning of models of fault location, have shown that global models of the network 
need not be provided in order that effective fault location can occur. However, our 
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research has observed the interaction between multiple qos-agents and our future work 
will consider mechanisms based upon Wolpert's Collective Intelligence (COIN) 
research in order to overcome this. 
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Abstract. A Reaetive system is one that is in eontinual interaction with its 
environment, and executes at a pace determined by that environment. Examples 
of such systems are network protocols, industrial-process control systems etc. 

The use of rigorous formal method in specification and validation, can help 
designers to limit the introduction of potentially faulty components during the 
construction of the system. 

Due to their complex nature, reactive systems are extremely difficult to 
specify and validate. In this paper, we propose a new formal model for the 
specification and the validation of such systems. This approach considers a 
Reactive System as a Reactive Multi-Agent System consisting of concurrent 
reactive agents that cooperate with each other to achieve the desired 
functionality. In addition, this approach uses formal synchronous specification 
and verification tools in order to specify and to verify the systems behaviors. 
Finally an example of an application of the approach is mentioned. 

Keywords. Reactive systems. Reactive agent, specification, formal methods, 
verification. 



1 Introduction 

A Reactive system is one that is in continual interaction with its environment, and 
executes at a pace determined by that environment. Examples of such systems are 
network protocols, industrial-process control systems etc. Reactive systems are 
responsive systems consisting of two or more reactive parallel sub-processes that 
continuously cooperate to achieve a pre-defmed goal [1]. In addition, such systems are 
intrinsically state based, and transition from one state to another is based on external 
and internal events. Another specificity of reactive system consists in taking into 
account a great number of events and temporal constraints. Thus, Reactive systems are 
complex computer systems, and may not be modeled by transformational techniques. 

The use of rigorous formal methods in specification and validation, can help 
designers to limit the introduction of potentially faulty components during the 
construction of the system. Specification modeling is an important stage in reactive 



H. Nakashima, C. Zhang (Eds.): PRIMA99, LNAI 1733, pp. 197-210, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




198 Bouchaib Bounabat et al. 



system design where the designers specify the desired properties in the form of a 
specification model that acts as the guidance and source for the implementation. 

Validation of an abstract specification of a reactive system, is an important aspect 
of system design. The operational problem here is how to determine if a reactive 
system is successful. One approach for validation is to consider observable behavior 
as a criteria to determine success. 

Due to their complex nature, reactive systems are extremely difficult to specify and 
validate. In this paper, we propose a new formal model for the specification and the 
validation of such systems. This approach considers a Reactive System as a Reactive 
Multi-Agent System, i.e a distributed computing system consisting of several 
autonomous reactive agents (as computing units) that coordinate their action in order 
to fulfill usually joint but also sometimes competitive tasks. Concurrency is further 
characterized by the need to express communication and synchronization among 
concurrent agents. 

This paper is organized as follows: Section 2 surveys the specification and 
verification used tools. Section 3 sets out the proposed formal model and its related 
temporal constraints. Section 4 describes the proposed hierarchical structure of 
Reactive Systems. Section 5 mentions the example of a reactive system which has 
been specified and verified with the approach . 

2 Specification and Verification Tools 

This section will describe all the specification and verification tools used in this work. 

2.1 STATECHARTS 

STATECHARTS (SC) are introduced by Hard [2] [3] like a visual formalism that 
provides a way to represent state diagrams with notions like hierarchy, concurency, 
broadcast communication and temporized state. A SC can be seen like one or several 
automata which are labeled by ?event[condition]/!action. SC is said to be synchronous 
because the system reacts to events by instantly updating its internal state and 
producing actions, the actions produced can trigger in the same instant other 
transitions, this is named chain reaction causing a set of transitions, the system is 
always in a waiting state until the condition for a transition is true. 

2.2 ESTEREL 

To hit this target, the specified SC behaviors are automatically translated to the 
synchronous language ESTEREL [4] [5] [6]. If s a language, with precisely defined 
mathematical semantics, for programming the class of input-driven deterministic 
systems. The software environment of ESTEREL provides high-quality tools, 
including an editor, compiler, simulator (XES tool), debugger and verifier. 
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2.3 Real-Time Temporal Logic 

Temporal logic has been widely used for the specification and verification of 
concurrent systems [7] [8]. However, these temporal logics only allow qualitative 
reasoning about time. Several extensions have been proposed for expressing and 
reasoning about real-time systems. These include Real-Time Temporal Logic (RTTL) 
[9], which is based on linear time temporal logic, and allows in addition the 
expression of quantitative real-time properties (e.g. exact delays or event deadlines). 

Example of RTTL Formula 

Si A t = T ^ 0 (s2 A t < T + 5) - If Si is true now and the clock reads T ticks, then 
within T + 5 clock ticks, S2 must become true. Thus, once Si becomes true, S2 must 
become true no more than 5 ticks later. This formula can be also written as follows: 

^[0,5] S2 or Si^ 0<=5 S2 

The formula S1OS3 indicates that events Si, S3 are simultaneous. If C(w) is a 
RTTL formula defining a temporal constraint on an event w, then w ||= C(w) means 
that w satisfies the formula C(w). 

3 Reactive Decisional Agent 

In this paper, the agents are classed as either deliberative or reactive [10] [11]. 
Deliberative agents derive from the deliberative thinking paradigm : the agents 
possess an internal symbolic, reasoning model and they engage in planning and 
negotiation in order to achieve coordination with other agents. Reactive agents don’t 
have any internal symbolic models of their environment, and they act using a 
stimulus/response type of behavior by responding to the present state of the 
environment in which they are embedded. 

3.1 A Brief Critical Review of Reactive Agents Work 

There is a yearning need for a clearer methodology to facilitate the development of 
reactive software agent applications. This requires the development of more 
associated theories, architectures and languages. 

Among the few current approaches for specifying Reactive Agents: [12] describes 
agents, tasks and environments using the Z specification language [13]; [14] specifies 
the reactive agent behavior by Real Time Knowledge models; [15] describes agent 
using temporal logic tools. 

All of these approaches lack of formal verification tools of the modeled behavior. 
Our purpose is to build a formal model of a reactive agent based on the decisional 
object concept [16]. The STATECHARTS models are used here in order to describe 
the reactive agent’s behavior. These behaviors will be formally checked in a 
qualitative (respectively quantitative) way by the synchronous language ESTEREL 
(respectively by Real Time Temporal Logic deduction). 
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3.2 Formal Description 

The proposed model of reactive agent consists in putting forward decisional models 
allowing the representation of objects according to their behavioral aspects and their 
degree of intelligence. 

Definitions. A Reactive Decisional Agent (RDA) is 9-tuple noted < A, D, S, E’, O, 
O’, act, dec, sig > where : 

- A : Set of actions exerted on the agent. Each action, undergone by an object, 
represents a possible operation to be carried out on this object in order to achieve a 
specific goal. 

- D : Set of decisions generated by the agent. Each decision is a solution concerning 
process behavior in the future; each decision is characterized by its action horizon 
: Ha, the time during which this decision remains valid. 

- S : Set of Signaling received by the agent. Each Signaling received by an object, 
reflects at any given time the state of the controlled tools used to achieve a specific 
goal. 

- E’: Set of external states delivered by the agent. Each one represents the object 
state emitted to the environment. 

- E : Set of agent’s internal states. Each one indicates the current state of the agent. 

- O: Set of agent’s internal objectives. Each decision is elaborated in order to 
achieve an internal objective according to the current external objective and the 
actual internal state. 

- O’: Set of agent’s external objectives which can be achieved. These objectives 
represent the agent’s interpreting of each action. 

From a dynamic point of view, the sets above indicate the received events (A, S), the 
emitted events (D, E’) and the internal events (E, O, O’). 

Decisional Functions, act, dec, and sig are three decisional functions that define the 
behavior of a RDA. 

act : A > O’ 

a > o’ with, 

V a G A, 3! o’g O’ / o’ = act(a) ^ a o’ (1) 

(1) means that the occurrence of an action a implies instantaneously the 
occurrence of its associated external objective o ’ by the function act. 

dec : O’ X E > D x O 

(o’, e) > (d, o) with, 

dec(o’, e) = (d, o) ^ [o’a e o d a o ] (2) 

(2) means that depending of the current external objective o' and as soon as the 
agent is in an appropriate internal state e, corresponding decision d an internal 
objective o, by the function dec, are instantaneously produced. 
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sig : O’ X O X S > E X E’ 

(o’, o, s) > (e, e’) with, 

sig(o’, o, s) = (e, e’) ^ [o’a o a s o e a e’ ] (3) 

(3) means that that depending of the eurrent external objeetive o ’ and the expeeted 
internal objeetive o, and as soon as the reeeipt of a signaling its assoeiated external 
state e ’ is instantaneously emitted and the new agent internal state beeomes e. 




Fig.l. According to the formal definitions above, figure.!, shows the internal structure of a 
Reactive Decisional Agent. Act interprets an action as an external objective, that it used by Dec 
an Sig to generate agent appropriate responses. 

Internal Architecture of an RDA. This section presents a set of SC which describe 
the external objective of a RDA. 

External Objectives Manager. A Reactive Decisional Agent has an External 
Objective Manager. It consists in a SC model of the function act described above 
(Fig. 2). 




Fig. 2 . This shows a figure consisting of a SC model of External objectives manager. Each 
state represents an external objective whose activation is started by the reception of a specific 
action {? Action), and terminated by the emission of the acknowledgment external state 
( lExternalObjective ) . 

In addition, each operating mode of the agent (normal mode, diagnostics modes, etc.) 
can be considered as an external objective to be reached. The objectives manager has 
to maintain the same objective or to change it, according to the occurred fault or 
failure. 

External Objectives Modeling. An external objective is composed by many others SC 
states corresponding to the associated internal states and internal objectives that are 
deducted by the functions dec and sig definitions (Fig. 3). 
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Fig.3. This figure shows the general SC model of an External objective. The transition 
{Internal state Internal objective ) is made by a decision emission {IDecision), and the 
transition {internal objective Internal state) is made by a signaling receipt {?S_OK), and 
eventually an external state emission {!ej. Internal state C corresponds to the default initial 
state of a SC model. Internal state and Internal objective are indicated respectively by Ci et Oi. 
In case of an action horizon exceeding without receiving any acknowledgment signaling, the 
agent’s internal state changes from Ci to ebi {breakdown state). 

3.3 Temporal Constraints of an RDA 

Decision Temporal Constraints. Each decision is characterized by its action horizon, 
Ha : the time during which this decision remains valid. So, an oecurrence of a decision 
requires the occurrence of its corresponding acknowledgment signaling, in a delay that 
doesn’t exceed its action horizon. 

This defines the following function, acqDec : 

acqDec : D > S x IN 

d > (s, Ha) = acqDec(d), with 

acqDec(d) = (s, Ha) ^ [ d ^ 0<=Ha s ] (4) 

In the following sections and for any decision d : 

- acqDec(d) indicates the aeknowledgment signaling of d, 

- Ha(d) is the action horizon of d, 

- C(d) points out the eonstraint [ d ^ 0<=Ha(d) acqDec(d) ] 

The temporal property that a RDA must verify : 

V d G D, d Ih C(d) (5) 

External Objective Temporal Constraints. Each external objeetive o’ is 
eharaeterized by an aeknowledgment speeifie external state e’, that indieates the good 
ending of o’, this defines a funetion acq : 

acq : O’ > E’ 

o’ > e’ = aeq(o’), with 

V o’ e O’, 3! e’ G E’ / e’ = acq(o’) (6) 

Dynamically, the event acq(o’) comes as early as the receipt of the 
aeknowledgment of the last deeision generated by o’. 
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Another function called durMAx is introduced in order to associate to each 
external objective o’ the longest duration of its operations execution. 

durMax : O’ > IN 

card (Do’) 

o’ > Z Ha(di), where diG D(o’) 

i=l 

By combining the two functions acq and durMAx, we can obtain the following 
constraint: 



V o’ G O’, o’ ^ 0<=durMax(o’) acq(o’) (7) 

i.e. after an occurrence of an external objective o’, the agent must generate the 
corresponding acknowledgment, in a delay that does not exceed durMax(o’). 

Action Temporal Constraints. Another function rep is introduced in order to define 
the acknowledgment of an action received by the agent. 

rep : A > E’ 

a > e’ = acq(act(a)) 

C(a) indicates the constraint [a ^ 0<=durMax(act(a)) rep(a)], the temporal property that a 
RDA must verify is : 

VugA, a 11= C(a) (8) 

The following assertion can be proved by deduction 

V a G A, [V d G D(act(a)), d ||= C(d)] ^ a ||= C(a) (9) 

4 RDA Based Hierarchical Structure of Reactive Systems 

We consider that a reactive system can be modeled as a distributed computing system 
consisting of several autonomous RDA. 

4.1 Internal Organization of a Reactive System 

A reactive system is defined by a set of agents, connected to each other by 
communication interfaces. Thus, its basic structure rests on a two levels tree (fig. 4) 




Fig.4. The internal organization of a reaetive system eonsists in a tree, that is made up in 
parallel of a supervisor {Supervisory Agent), of two or several sub-agents components, and two 
communication interfaces between the supervisor and the sub-agents. 
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Such system interacts with its environment by the means of : 

- Actions exerted by this environment. 

- External States emitted to the environment. 

Supervisory and Sub-Agents Levels. The supervisory agent (SRDA : Supervisory 
Reactive Decisional Agent) is a RDA controlling the component sub-agents, in order 
to achieve a goal or to solve a given problem. 

This agent will manage the sequences of activation and the definition of the controlled 
sub-agents objectives. This management depends on : 

- the actions exerted by the environment, 

- the events generated by the sub-agents activities, 

- the temporal constraints specific to any reactive system. 

In addition, a reactive system can be summarized with a simple SRDA directly 
connected to the controlled process. Each sub-agent can be considered as a reactive 
system. Thus, its internal structure is composed by its own SRDA, communication 
interfaces and sub-agents. A sub-agent objectives are to carry out sequences of tasks 
in response to any temporal constrained action exerted on him by the higher level. 

Communication Interfaces. The communication interfaces are of two types : 
decisional interface (Top/Down) and signaling interfaces (Bottom/Up). 



SRDA 

Decisional Interface 











Sub-Agentj 


]... 1 


Sub-Agent 2 


\ • • • Sub-Agent3 



Fig.5. Decisional interface that translates a decision (d) generated by the SRDA into several 
actions each one of them is intended for a sub-agent of the lower level. 




Fig.6. Signaling interface that synchronizes the external states (e sent by each sub-agent, 
and emits one signaling (.?) intended for the SRDA. 

4.2 Temporal Properties of the Specified Reactive System 

Through the notion of an action horizon (Ha) of a decision, the time during which the 
decision remains valid, the RDA-based specification of a reactive system ensures that 
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the elements will have time periods eoherent with the deeision made by the agent, and 
coherent with the time periods of decisions made at lower levels of the hierarchy. The 
higher an agent is in the hierarchy, the greater the action horizon (Fig. 7). 




Fig.7. Flow of information inside a SMA formed by ARDC agents. The top-down flow 
consists in actions {a, ay) and their associated decisions (4, dy^. The bottom-up flow consists 
in external states (rep (a), rep(ay)) and their associated signaling (acqDec(dj), acqDec(di^)). 

The temporal constraints must be checked on each hierarchical level. The recursive 
character of this structure makes it possible to generalize the results obtained for only 
one hierarchical level. Thus, we can prove by deduction and according to notations of 
fig. 7: 

di/|hC(d./)^a|hC(a) (10) 



5 Example 

We consider here an example borrowed from the communications field. It forms part 
of the Automatic Switching Protection System of the ATT&T’ S [17]. 

5.1 System Description 

The idea is to provide more than one line for each communication channel. If a line 
fails, a backup line, called the ‘protection line’ is used instead. A line is considered 
failed when the bit rate exceeds the degraded range, or whenever other hard failures 
have occurred, such as a complete loss of signal. The expected response to a failed 
signal on the working line, is to automatically switch to the protection line, if that line 
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is in better eondition. The APS description consists of the specification of supervisory 
agent APS System, communication interfaces, and the agents Line_W and Line_P. 



Iipt CUpi 



E_APSJsSatted 

E_/PS_E8atted 

EJPS_StCFped 

I^/rS_To_EFiiicticn 

I^i^JbJSRjtEticn 

EJ4>S_To_acp 





(tnmntta-) \jat^(woeivaj 

DFSW AChiBct 



DFa^ 

DWReacV ACheckline 



AEisocnnect line 



ACbnnect line 




FIG.8. APS Architecture with its SC Models and Communieation Interfaees. The objeetive 
X_ChangeToP consists of two SC diagrams : protection line starting and working line repair. 
The reeeption of S_WCleared implies the emission of E_APS_To_Nfunction the return to 
normal operation. It is showed in the external objectives manager level by the re-establishment 
of the initial objeetive (X_Install_W). The emission of an external state by an external 
objeetive SC (?EJine_connected for example) ean be used by the external objective manager 
in order to ehange the current external objeetive (!E_line_connected). 
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5.2 Checking of the Models 

The operational problem here is how to determine if a reactive system is successful. 
The approach adopted here is to translate the specified SC behaviors to the 
synchronous language ESTEREL. According to automated translation tool developed 
in [17], the mapping of a modelled reactive system in ESTEREL is done easily by 
translating the communication interfaces (ID, IS), the supervisory agent (APSS) and 
the sub-agents (LineW, LineP). The ESTEREL code associated to APS module : 

module APS : 

input % APSS Actions, LineW and LineP Signalings 

A_Start_APS,A_Stop_APS,S_PCDk,S_WCOk, S_PDOk, SJAIDOk, S_PReady, SJAIReady ; 
output % APSS External States, LineW and LineP Decisions 
E_APS_Stopped, E_APS_ToStop, E_APS_NStarted, E_APS_DStarted, 

E_APS_ToNFunction, E_APS_ToDFunction, D_PConnect , D_PDisconnect , D_WConnect , D_W 
Disconnect, D_PCheck, D_WCheck ; 
signal % Interfaces Events 

A_Connect_PLine , A_Disconnect_PLine , A_Check_PLineE_PLine_Connected, 
E_PLine_Cleared, E_PLine_Disconnected, A_Connect JAILine , A_Disconnect_WLine , A_C 
heckJIAnijine, E_TAlLine_Connected, E_WLine_Cleared, E_V\ILine_Disconnected, 
S_WCleared, S_PSelected, S_WSelected, S_Stopped, D_FSP, D_WW, D_FSW, D_Arret 
in % Parallel ESTEREL modules 

run APSS | | run ID | | run IS | | run Line_P | | run Line_W 
end signal; 

The output of this translation is a piece of ESTEREL code which can be compiled into 
a finite state machine by the ESTEREL compiler and formally checked folowing 
temporal propreties and using ESTEREL automated verification tools. 

Sub-Agent Line Temporal Propreties. From the property (9) : 

V a G A, [V d G D(act(a)), d ||=C(d)] ^ a ||=C(a) 
we can deduce the following properties: 

D_Connect 1 C(D_Connect) ^ A_Connect_Line 1 C( A_Connect_Line) (11) 

where C(D_Connect): D_Connect ^ 0<4 S_COk 

and C(A_Connect_Line): A_Connect_Line ^ 0<4 E_Line_Connected 

D_Disconnect ||= C(D_Disconnect) 

^ A_Disconnect_Line ||= C(A_Disconnect_Line) (12) 

where C(D Disconnect): D Disconnect ^ 0<3 S DOk 

and C(A_Disconnect_Line): A_Disconnect_Line ^ 0<3 E_Line_Connected 

D_Check ||= C(D_ Check) ^ A_Check_Line ||= C(A_Check_Line) (13) 

where C(D_Check): D_Check ^ 0<2*n S_Ready 

and C(A_Check _Line): A_Check _Line ^ 0<2*n E_Line_Cleared 

(1 1), (12), (13) can be checked by deduction (Appendix) or by XES simulation. 
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XES Simulation. Simulator XES provides a graphical representation of the generated 
state machine, and helps designers to verify the different behaviors 

XES Simulation of the A_Disconnect action can be summarized by figure 9 
showing the event received or emitted by the RE) A Line. 



A_Disconnect_LiTie 

Line !_ _ _Q_ 

D Disconnec 



fE Line Disconnected 



S DOk 



3 ticks maximum 



Fig.9. Simulation of the agent Line of communication. The reception of the action 
A_Disconnect and the emission of the decision D_Disconnect are simultaneous. The reception 
of S_DOk before the horizon of action (3 ticks) of D_Disconnect, involves the simultaneous 
generation of the external state E_Line_Disconnect. (12) is thus checked. The same verification 
can be adopted in the cases of the properties (1 1) and (13). 



Expression of the APS Properties. From the property (10) : 

di>|hC(di/)^a|hC(a) (10) 

we can deduce: 

D WConnect ||= C(D WConnect) a D PDisconnect ||= C(D PDisconnect) 

^ A_Start_APS |h C(A_Start_APS) (14) 

C(D WConnect): D WConnect ^ 0<4 S WCOk 
C(D_PDisconnect): D_PDisconnect ^ 0<3 S_PDOk 
C(A_Start_APS): A_Start_APS ^ 0<6 E_APS_NStarted 



Verification. Two types of XES simulations can be used in order to check the liveness 
properties of the system : 

-The end-user point of view, i.e. by masking the system internal events. This 
simulation is carried out by the observation of the behavior of APSS (fig. 10) 



AStat^ 

, 


I 


f 

APSS 1 


nssv 


S_Wdected 




^ 




6tids 



Fig.lO. APS Simulation from an end-user point of view. The reception of the action 
A_Start_APS and the emission of the decision DFSW are simultaneous. The reception of 
S_WSelected before the horizon of action {6 ticks) of DFSW, involves the simultaneous 
generation of the external state E_APS_NStarted. This checks the behavior of APSS. 
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- The modelled agent point of view, i.e. by showing the system internal aetions as 
well as the entering or outgoing events of the customer. This simulation is carried 
out by the observation of the behavior of all the systems components (Fig. 1 1). 




Ie APS NStarted 



Fig.ll. APS Simulation from an agent point of view. The reception of the action A_Start_APS 
implies the simultaneous of D_WConnect {by Line_W) and D_PDisconnect {by Line_P). The 
Line_W reception of S_WCok before the horizon of action {4 ticks) of D_WConnect and The 
Line_P reception of S_PDok before the horizon of action {3 ticks) of D_PDisconnect involve 
the simultaneous generation of E_APS_NStarted. This is coherent with figure. 10. result. 



6 Conclusion 

The contribution of this paper is to give a new formal approach to deal with 
specification and formal verification of a reactive system. The originality is to 
consider each component of reactive system as a Reactive Decisional Agent, and to 
bring together several formal synchronous modeling and validation tools. With its top- 
down process and its principles of decomposition, this method allows to get a model 
which is more easily understandable by the user. The STATECHARTS models are 
used here in order to describe the reactive agent behaviors. These behaviors will be 
checked in a qualitative (respectively quantitative) way by the synchronous language 
ESTEREL (respectively by Real Time Temporal Logic deduction). The mechanism of 
action horizon, the time during which an agent decision remains valid, is moreover 
useful to specify temporal performances. 

The resulting model can be useful for every application in which it is necessary to 
include one or several reactive components. The specification and verification of the 
Automatic Switching Protection System has stressed the validity and usefulness of the 
approach. 
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Appendix: Verification of Formulate (11) by RTTL Deduction 



A_Connect_Line ^ X_To_Connect (1) 

X_To_Connect ^ D_Connect a 0_ Wait_Connection (2) 

as D_ Connect ^ 0<4 S_COk (4) 

and X_To_Connect a 0_ Wait_Connection a S_COk 

^ I_Connected a E_Line_Connected (3) 

A_Connect_Line ^ 0<4 E_Line_Connected (11) 

The same step of checking can be used for (12) and (33). 
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Abstract. The motivation for investigating the transformation between 
the EMYCIN model and the PROSPECTOR model lies in a realistic con- 
sideration. In the past, expert systems exploited mainly the EMYCIN 
model and the PROSPECTOR model to deal with uncertainties. In other 
words, a lot of stand-alone expert systems which use these two models 
are available. If there are reasonable transformations of uncertainties be- 
tween the EMYCIN model and the PROSPECTOR model, we can use 
the Internet to couple them together so that the integrated systems are 
able to exchange and share helpful information with each other, thereby 
improving their performance through cooperation. In this paper, we dis- 
covered a class of exactly isomorphic transformations between uncertain 
reasoning models used by EMYCIN and PROSPECTOR. More interest- 
ingly, among the class of isomorphic transformation functions, different 
ones can handle different degrees to which domain experts are optimistic 
or pessimistic if they perform such a transformation task. 



Keyword: Multi-agent, uncertainty, distributed expert system, algebra. 

1 Introduction 

The problem-solving ability of expert systems is greatly improved through co- 
operation among different expert systems in a distributed expert system [6]. 
Sometimes these different expert systems may use different uncertain reasoning 
models [11]. In each reasoning model, the uncertainties of propositions take val- 
ues on a set. These sets are different in different models. For example, the set is 
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the interval [— 1] in the EMYCIN model [7,8,4], while the set is the interval 
[0, 1] in the PROSPECTOR model [1]. So, to achieve cooperation among these 
expert systems, the first step is to transform the nncertainty of a proposition 
from one nncertain reasoning model to another if they nse different nncertain 
reasoning models [9, 10], then the second step is to synthesize the transformed dif- 
ferent resnlts [14]. In other words, the transformation among different uncertain 
reasoning models is the foundation for cooperation among these heterogeneous 
expert systems, and so this is a very important and very interesting problem. 

Recently, there were a few papers which addressed this topic. Zhang and 
Orlowska [13] showed that the sets of propositional uncertainties in several well- 
known uncertain reasoning models with appropriate operators are semi-groups 
with individual unit elements. The further work of Zhang [10] used this re- 
sult to establish transformation criteria based on homomorphisms, and to define 
transformation functions approximately these criteria. These functions 

work well between any two of the uncertain reasoning models used by EMYCIN, 
PROSPECTOR and MYCIN [8]. Hajek [2] also tried to build an isomorphism 
between the models used by EMYCIN and PROSPECTOR, but he implicitly 
assumed that in the PROSPECTOR model the unit element is always 0.5. Un- 
fortunately, the unit element is the prior probability of a proposition, and so 
varies with different propositions. In [12], in the case where the unit element in 
the PROSPECTOR model can take any values on [0, 1], we give an isomorphic 
transformation function between the EMYCIN and PROSPECTOR models. 

Based on the work [12], this paper further constructs a class of the isomor- 
phic transformation functions, which can exactly transform the uncertainties 
between the EMYCIN and PROSPECTOR models under the condition that in 
the PROSPECTOR model the unit element can take any values on [0, 1]. 

Intuitively, a value representing belief would be transformed into a bigger 
value by domain experts with an optimistic view than by experts with a pes- 
simistic view, while a value representing disbelief would be transformed into a 
smaller value by domain experts with an optimistic view than by experts with a 
pessimistic view. The significance of our class of isomorphic transformations is 
that they can handle such nice intuitions. 

The motivation of investigating the transformation between the EMYCIN 
model and the PROSPECTOR model lies in a realistic consideration. In the 
past, expert systems exploited mainly the EMYCIN model and the PROSPEC- 
TOR model to deal with uncertainties. In other words, a lot of independent sys- 
tems using these two models pre-exist. If there are reasonable transformations 
of uncertainties between the EMYCIN model and the PROSPECTOR model so 
that the models can share heterogeneous information, we can use the Internet 
to couple these systems together so that their problem solving capability is ex- 
tended by sharing helpful information among each other. The work of Jennings 
[3] provides a multi-agent architecture for cooperation among possibly preexist- 
ing and independent systems, but the problem of sharing information between 
the EMYCIN-style system and the PROSPECTOR-style system has not been 
addressed. 
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The rest of the paper is divided as follows. In Section 2 , we show that the 
sharing information among heterogeneons nncertain reasoning models in rnle- 
based systems can be based on homomorphic (especially isomorphic) mapping 
between the algebraic strnctnres of nncertain reasoning models. In Section 3 , 
we briefly review the algebraic strnctnres of nncertain reasoning models nsed by 
EMYCIN and PROSPECTOR. In Section 4 , nnder the criteria, we discover a 
class of the isomorphic fnnctions which can exactly transform the uncertainties 
of a proposition between the EMYCIN and PROSPECTOR models for any 
value of the prior probability of this proposition. This solves one of the key 
problems in the area of distributed expert systems, which are special multi- 
agent systems, because it offers a perfect solution for cooperation among different 
expert systems using the EMYCIN model and the PROSPECTOR model. In the 
last section, we summarize the paper. 

2 The Criteria for Transformations 

A multi-agent system can be regarded as a loosely coupled network of au- 
tonomous entities called agents which have individual capabilities, knowledge 
and resources, and which interact to share their knowledge and resources, and 
to solve problems being beyond their individual capabilities. In a multi- agent 
system, if different intelligent agents use different uncertain reasoning models, 
in order to share information the uncertainty value of a proposition is needed 
to be transformed from one model to another when these agents cooperate to 
solve problems. This section considers how to judge which transformations are 
reasonable. 

In an uncertain reasoning model, the propagation for uncertainties depends 
on five operations: AND^ OR^ NOT^ IMPLY ^ and parallel combinations. The 
parallel combination operation is specific to an uncertain reasoning model be- 
cause uncertain reasoning concerns the degree of uncertainty. This operation is 
used to combine the uncertainties of the same proposition from different sources. 

Intuitively, the order between transformation and parallel combination should 
be irrelevant. Suppose in a multi-agent system there are three agents ESi^ ES2 
and ES3, ESi and ES2 employ the same uncertain reasoning models. ES3 uses 
another different one. There are the following possible events: 

1 ) ESi and ES2 output, respectively, to E'Ns, two pieces of uncertainty in- 
formation about a same proposition H . That is, first transform these two 
pieces of information from the uncertain model (used by ES\ and ES2) into 
another model (used by E'S'3), and then perform a parallel combination in 
the model used by ES3. 

2) Suppose in ES\ there are two rules as follows: 

H2^H, 

There is enough information to allow ESi to use the above two rules and 
get two pieces of uncertainty information about the proposition H . Clearly, 
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ESi should first perform a parallel combination on these two pieces of infor- 
mation, then transform the result from the model used by ES\ to the model 
used by ES^^^ finally output it to ES^y- 

In the above two events, if their two pairs of information about H are the 
same, evidently the results which ES3 obtained should be the same. That is, 
the result of transformation after parallel combination should be the same as 
that of parallel combination after transformation. In other words, the parallel 
combination operation should be preserved under the transformation function. 

In an uncertain reasoning model, the set of all possible estimates for the 
uncertainty of propositions should contain the three special elements as follows: 
T, _L and e. 

1) T represents that proposition H is known to be true, e,g. in the EMYCIN 
model, T = 1, also in the PROSPECTOR model, T = 1; 

2) T represents that proposition H is known to be false, e.g. in the EMYCIN 
model, T = — 1 , while in the PROSPECTOR model, T = 0; and 

3) £, called as a unit element, represents the uncertainty of the proposition 
H without observations, e.g. in the EMYCIN model, e = 0, while in the 
PROSPECTOR model, ^ = P(7L). 

Obviously, on transforming uncertainty estimates from one uncertain model to 
another, these special values should correspond each other. 

In an uncertain reasoning model, the set of estimates for uncertainties of 
propositions is an order set. Obviously, on transforming uncertainty estimates 
from one uncertain model to another, the order relations should be preserved. 

After discussing intuitions, we can define formally the criteria now. In two 
different uncertain reasoning models, 1 ) let the sets of possible uncertainty val- 
ues of H be Ui and II2, respectively, 2) let the order relationships on Ui and II2 
be <1 and < 2 , respectively; 3) let the uncertainty be described by T 1 and T 2 ^ 
respectively, when proposition H is known to be true, while by Ti and T 2 , re- 
spectively, when false; 4) let the parallel combination operators on Ci — {Ti, Ti), 
and on IJ 2 — {T 2 ,± 2 } be and 0 ^/ 3 ? respectively; and 5) let the unit element 
of H be 61 and ^2 ? respectively. 

Definition 1 A map E : Ui — ^ IJ2 is said to be an h-transformation from 

(Cl - {Ti,±i},©(7j to {U2- {T2,-L2},®U2), if it satisfies 

1- T{®uA^,y)) = ®u^{r{x),T{y)), \/x,y eUi - {Ti,Li}; 

2. r{Ti) = T 2 ; 

3. r{Li) = ± 2 ; 

f. T{ei) = 62 ; 

5 . Mxi,X2 G O - {T 1, J_i}, if xi <1 X2, then r{xi) <2 T{x2). 



In the above definition. Item 1 is the preservation of parallel operations. 
Items 2, 3 and 4 are corresponding relationships of special elements. Items 5 is 
the preservation of order relationships. 
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Before further discussion, it is useful to recall several basic concepts in alge- 
bra. If X is a set of some elements, and the operation 0x is performed on X, 
then the pair (X, 0x) is called an algebra structure. Let (X,0x) and (Y, 0y) 
be two algebra structures. A mapping / : X — y Y is called as a homomorphism 
if 

f{®x{xi,X2)) = ©y(/(*l),/(*2)), ^Xi,X2 e X. 

Furthermore, if / is an 1 — 1 mapping, it is called an isomorphism between 
(X, 0x) and (Y, 0y). 

Therefore, actually an h-transformation between two uncertain reasoning 
models is a homomorphism between the two algebra structures corresponding 
to these two uncertain reasonings. 



3 EMYCIN and PROSPECTOR Algebra Structures 

Since the criteria for reasonable transformations are based on the algebra struc- 
tures corresponding to uncertain reasonings, before discussing how to construct 
transformation functions among the EMYCIN model and the PROSPECTOR 
model, this section discusses their algebra structures. 



3.1 The EMYCIN Algebra 

In the EMYCIN model, the set X of uncertainties of any proposition is the 
interval [-1,1], and the combination operation ®cF on (—1, 1) is defined as 

(Bcf{CF{H, Si),CF{H, S2)) = CF{H, 5 i A 5 a), 



where CF[H, Si A S2) is given by 



CF{H, Si A S2) 



CF{H, Si) + CF{H, S2) - CF{H, Si)CF{H, S2) 
if CF{H,Si) > 0, CF{H,S 2 ) > 0, 
CF{H, Si) + CF{H, S2) + CF{H, Si)CF{H, 5 a) 
if CF{H,Si) < 0, CF{H,S 2 ) < 0, 

CF{H,Si) + CF{H,S2) 
l-mm{\CF{H,Si)\,\CF{H,S 2 )\} 

if CF{H,Si) X CF{H,S 2 ) < 0. 



Theorem 1 ((— 1,1),®Cf) is « group. 



Proof. It is easy to verify that the operator (Bcf on (—1,1) is closed, and 
satishes the associative and commutative laws. The unit element is 0 and the 
inverse element of x is —x [9]. So, ((— b is a group. □ 

We describe the above group in an abstract way as follows: 



1) set: (-1, 1); 
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2) operator 0cF ^ (—1? 1) X (—1? 1) — ^ (—1? 1) is given by: 



{ Xi X 2 — X1X2 if Xi > 0 ^ X 2 > Oj 
Xi X2 X1X2 if Xi < 0 , X2 < 0^ 

-j I — p- if < 0; 

3) unit element is 0; 

4) G (— 1), inverse element of x: 

x~^ — —X, 



3.2 The PROSPECTOR Algebra 

In the PROSPECTOR model, the set of uncertainties of any proposition H is 
the interval (0,oo), and the combination function 0o on (0,oo) is defined as 

0o(O(if |5i), OiH\S2)) = 0{H\Si A S 2 ), (2) 



where 0{H\Si A S 2 ) is given by 

OiH\SiAS2) 



0{H\Si)0{H\S2) 

0{H) 



( 3 ) 



In the above combination function, O represents 0dds\ The relationship between 
odds and probability is given by 



0(x) 



pj^) 

1 - P{x)' 



( 4 ) 



By using the relationship formula (4), we can turn (3) into (6). In other words, we 
can transform the PROSPECTOR model from a form of odds to an intermediate 
form of probability: the set of uncertainties of any proposition H is (0, 1), on 
which the combination function 0p is defined as 



P{H\S2)) = P{H\Si A S 2 ), (5) 



where P[H\Si A S 2 ) is given by 



... P{H\S^)P{H\S2)PPH) 

I I>^iao2; p(^^H\S\)PpH\S2)P{H) + P{H\Si)P{H\S2)PpH) 

Theorem 2 ((O,oo),0o) ts a group. 



( 6 ) 



Proof. We can verify that the operator 0o on (0, 00 ) is closed, associative, 
and commutative [9]. Moreover, the unit element exists, and for any element 
X E (0,oo), its inverse element x~^ exists [9]. Thus ((O,oo),0o) is a group. □ 

We can describe the group (Y, 0o) in an abstract way as follows: 

1) set: (0, 00 ); 
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2) operator 0o: (0^ oo) x (O^oo) — ^ (O^oo) is defined as 

XiX2 

0o(^i,^2) = 

3) unit element is 0(H) for proposition H (constant); 

4) Vx E (OjOo), the inverse element of x: 

X 

In contrast with the EMYCIN model, the operator 0o is related to the unit 
element of proposition H explicitly, because there are different unit elements for 
different propositions in the PROSPECTOR model. 



Theorem 3 ((0, l),0p) is a group. 



Proof. We can verify that the operator 0p on (0, 1) is closed, associative, and 
commutative [9]. Moreover, the unit element exists, and for any element x E 
(0, 1), its inverse element x~^ exists [9]. Thus ((0, 1), 0p) is a group. □ 

The group ((0, l),0p) is described as follows: 

1) set: (0, 1); 

2) operator 0p : (0, 1) x (0, 1) — V (0, 1), is defined as : Vxi, X 2 E (0, 1) 

^1^2(1 “ P {^)) 

^ (1 - Xi)(l - X2)P(H) 0 - P(H)) ^ 

3) unit element is P(H) (constant); 

4) Vx E (0, 1), inverse element of x: 

x{l - 2P{H)) + 

Theorem 4 ((0, l),0p) is an isomorphic group to ((O,oo),0o)^ 



Proof. Let a map fo^P • (0, 00 ) — y (0, 1) be 



fo^p(x) 



1 X 

Then fo^P is an isomorphism from ((O,oo),0o) to ((0, l),0p). In fact, clearly, 
fo^P is an 1 — 1 map, and 

/o^.p(©o(a;i,*2)) = 

X 1 X 2 



0{H) + X 1 X 2 



T0x^x(l-P(iJ)) 



(1 - T0)(1 - T0)C(i/) + T0 X ^ X (1 - P{H)) 

Xi X2 



= ®p( 



-) 



'I + xC I -\- X 2 

= ®p(/o^p(*l), /o-i>p(*2))- 



□ 
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4 A Class of Isomorphic Transformations 

After discussing the algebra structures of the EMYCIN model and the PROSPEC- 
TOR model, now we can give transformation functions between the models. 

Lemma 1 The map 

f if X > 0 
\ 2^ - 1 if < 0 

/s an isomorphism from ((— oo, oo), -h) to ((— 1, 1), 

Hajek [2] gave the above lemma, which tells us that the set of all reals 
(— 00 , 00 ), with the usual addition 0, is isomorphic to the EMYCIN model 
((—1, 1), 0 cf)- Hajek [2] also tried to give an isomorphism from ((— cxd, 00 ), 0) to 
the PROSPECTOR model. Unfortunately, his solution was correct only in a very 
special case, that is, P{H) = 0.5. Whereas, in the following lemma, we will give, 
under the general case that P{H) could be any value in [0, 1], an isomorphism 
from ((— 00 , 00 ), 0) to the PROSPECTOR model. 

Lemma 2 The map 



fi^) 



2 ar X 

+ X P{H) 



( 8 ) 



IS an isomorphism from the group [[— 00 ^ 00 ) ((0, 1), 0p), whereat (0?oo) 

IS a constant. 



Proof. Clearly, / is an 1 — 1 map. 



®p(/(*l),/(*2)) 



f{xi)f{x 2 )(l-P{H)) 

(1 - + f{x,)f{x 2 ){l - P{H)) 

1 






+ 1 



1 



2°^^! X-P(ff) 
R(^H) + 2^^1 XR(H) 
2°^^! X-P(ff) 

X-P(-Ff) ^ 



■ )(i- 



2^^2 xR(H) 

+ X-P(fJ) 



)P{H) 



— — (i-p(H)) 



+ 1 



2«®i+^2) X P(if) 

+ 2“®!+^® X P{H) 



f{xi + X2). 



Therefore, the lemma holds. 



□ 



Lemma 3 Let fi be an isomorphism from the group (Gi,0i) to the group 
{G 27 & 2 )? let f 2 be an isomorphism from the group {G 27 & 2 ) the group 
(Gs, 03 ). Then ff^ is an isomorphism from (G 2 ^ 02 ) to (Gi, 0 i), and f 2 ^fi is 
an isomorphism from (Gi,0i) to (Gs, 03 ). 
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This is basic fact in modern algebra [5]. 
Theorem 5 The map 



fcF^p{x) 



mi 

{l-xrx{l-P{H))+P{H) 
(l+x)°‘xP(H) 
l-P{H) + {l+r)°: xP{H) 



if 1 > a? > 0 
if 0 > a? > — 1 



( 9 ) 



IS an isomorphism from ((— 1)^ 0Cf) io ((O^l)j0p), 



Proof. By Lemmas 1 and 3, we know that the map 

_ / -log 2 (l - a;) if 1 > a; > 0 
^ ^ ^ |^log2(l0a?) ifO>ir>— 1 



is an isomorphism from ((— 1)? 0 Cf) to ((— oo^ oo)^ 0). And by Lemmas 2 and 
3, we know that an isomorphism from ((— 1) ? 0cf) to ((0^ 1) ^ 0p) is as follows: 



fcp^piO 



f°9 



20 ^C- iog 2 (i-^)) xP(H) 
1-P(F) + 2“C- i°S 2 (i-^)) xP(H) 
2«ios2(i+^)xP(p) 
l-P(F) + 2“ ios 2 (i+^) xP(H) 



if 1 > :r > 0 
if 0 > ir > — 1 



0(0) 



(i+xrxP(H) 

l-P(P) + (l+ir)-xP(P) 



if 1 > :r > 0 
if 0 > ir > — 1 






if 1 > :r > 0 



(l-ir)-x(l-P(P))+P(P) 

(i+0-xP(P) 0o>ir>-l 

l-P(P) + (l+s:)“xP(P) ^ J- 



□ 

Notice that fcF^p{l) = 1, fcp^p{—l) = 0, fcF^p{0) = P{H), and 
fcF^p{x) is monotonic and increases. Thns, by Definition 1, the above the- 
orem gives /^-transformations from the EMYCIN model to the PROSPECTOR 
model. 

By the above theorem, when a = 1 we have: 



Corollary 1. The map 



fcF^p{x) 



TLE) 

l-rx(l-P(P)) 

l-\-xxP{H) 



if 1 > ir > 0^ 
if 0 > > — 1. 



( 10 ) 



IS an isomorphism from ((— l,l),0cp) to ((O,1),0p). 

This is the resnlt of [12]. Eor this mapping, we draw its fignres, as shown in 
Eignre 1, in some cases where P{H) takes different valnes. 
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Fig. 1. Isomorphism from the EMYCIN model to the PROSPECTOR model 




In [2], Hajek just gives the one in the case P{H) = 0.5 among many trans- 
formation functions in Figure 1. 

For the same value of when a takes different values the corresponding 

isomorphisms are different. This is reasonable because different human experts 
have individual attitudes in the transformation of the EMYCIN model to the 
PROSPECTOR model. In Figure 2, we draw some figures of this isomorphism 
when a takes different values for the same value of P(iL). 

Fig. 2(a). When P(H)=0.2 the comparison of isomorphisms from the EMYCIN 
model to the PROSPECTOR model 
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Fig. 2(b). When P(H)=0.5 the comparison of isomorphisms from the EMYCIN 
model to the PROSPECTOR model 




Fig. 2(c). When P(H)=0.8 the comparison of isomorphisms from the EMYCIN 
model to the PROSPECTOR model 




From the comparison in Fignre 2, we can see that onr family of the isomor- 
phisms can captnre some nice intnitions of hnman experts. 

— In real life, there are some persons who are positive. That is, when they are 
in good sitnation they do not feel the sitnation is so good and process things 
still carefnlly, while when they are in bad sitnation they do not regard the 
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situation so bad and be confident to handle the problem. When a < 1, the 
isomorphisms can capture such an attitude of persons. In fact, in the case 
a < 1, for the same value of P(iL), when the value of a is getting smaller, the 
transformed value representing belief in H is also getting smaller, instead the 
transformed value representing disbelief in H is getting bigger. Intuitively, 
a value representing belief would be transformed into a smaller value (i.e. 
less belief) by a domain experts when he/she is more credulous, while a 
value representing disbelief would be transformed into a bigger value (i.e. 
less disbelief) by a domain expert when he/she is more confident. 

— In real life, there are some negative persons. That is, when they are in good 
situation they feel the situation was better than it really is, while when they 
are in bad situation they feel the situation was worse than it really is. When 
a > 1, the isomorphisms can capture such an attitude of persons. In fact, in 
the case a > 1, for the same value of P(7L), when the value of a is getting 
bigger, the transformed value representing belief in H is also getting bigger, 
instead the transformed value representing disbelief in H is getting smaller. 
Intuitively, a value representing belief would be transformed into a bigger 
value (i.e. more belief) by a domain experts when he/she is less credulous, 
while a value representing disbelief would be transformed into a smaller value 
(i.e. more disbelief) by a domain expert when he/she is less confident. 

Accordingly, the value of a actually can be regarded as representing the degree 
to which represents domain experts are positive or negative. In summary, 1) 
a < 1 means domain experts is positive, and the smaller a, the more positive 
the domain experts; 2) a > 1 means domain experts is negative, and the bigger 
a, the more negative the domain experts; 3) a = 1 means domain experts is 
neutral. 



Theorem 6 The map 



fp^CF{x) 






( 11 ) 



IS an isomorphism from ((0,1), 0p) to ((— 1, 1), 0cp). 

Proof. Note fp^cF — fcF^P^ Lemma 3, the conclusion holds. □ 

Notice that /p_).cp(l) = 1, /p_).cp(— 1) = 0, /p_).cp (P(iL)) = 0, and 
fp ~^cf{^) is monotonic and increases. Thus by Definition 1, the above theo- 
rem gives /z-transformations from the PROSPECTOR model to the EMYCIN 
model. 

By the above theorem, when o = 1, we have: 

Corollary 2. The map 



fp^CF{x) 






( 12 ) 



IS an isomorphism from ((— 1, 1), 0pp) to ((O,1),0p). 
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This is the result of [12]. For this mapping, we draw its figures, as shown in 
Figure 3, in some cases where P{H) takes different values. 

Fig. 3. Isomorphism from the PROSPECTOR model to the EMYCIN model 




For the same value of P(iT), when a takes different values the corresponding 
isomorphisms are different. This is reasonable because different human experts 
have individual opinions in the transformation of the PROSPECTOR model to 
the EMYCIN model. In Figure 4, we draw some figures of this isomorphism 
when a takes different values for the same value of P{H)^ 

Fig. 4(a). When P(H)=0.2 the comparison of isomorphisms from the 
PROSPECTOR model to the EMYCIN model 
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Fig. 4(b). When P(H)=0.5 the comparison of isomorphisms from the 
PROSPECTOR model to the EMYCIN model 




Fig. 4(c). When P(H)=0.8 the comparison of isomorphisms from the 
PROSPECTOR model to the EMYCIN model 




The analysis to Fignre 4 is similar, bnt the value of a indicates different 
meaning as follows: 1) a = 1 means the view of domain experts is neutral. 2) 
a y 1 means domain experts is positive. 3) a < 1 means the view of domain 
experts is negative. The smaller value of a, the more negative the domain experts. 
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5 Conclusions 

The transformation among nncertain reasoning models is the fonndation for a 
distribnted heterogeneons expert system. We constrnct a class of isomorphic 
transformations which can exactly transform the uncertainties of a proposition 
between the EMYCIN model and the PROSPECTOR model for any value of 
the prior probability of the proposition. This solves one of the key problems in 
the area of distributed expert systems. Besides, among the class of isomorphic 
transformation maps, different ones can handle different degrees to which domain 
experts are optimistic or pessimistic when performing a transformation task. 
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Abstract. In this paper, we propose an agent architecture to improve 
flexibility of a videoconference system with strategy-centric adaptive QoS 
(Quality of Service) control mechanism. The proposed architecture real- 
izes more flexibility by changing their QoS control strategies dynamically. 
To switch the strategies, system considers the properties of problems oc- 
curred on QoS and status of problem solving process. This architecture 
is introduced as a part of knowledge base of agent that deals with coop- 
eration between software module of videoconference systems. We have 
implemented the mechanism, and our prototype system shows its ca- 
pability of flexible problem solving against the QoS degradation, along 
with other possible problems within the given time limitation. Thus we 
confirmed that the proposed architecture can improve its flexibility of a 
videoconference system compare to traditional systems. 



1 Introduction 

To use videoconference systems (VCSs) [l]-[4] on heterogeneous computers and 
network environments, users have to consider lots of conditions such as status of 
system resources, situations of other participant’s site over network, sledding of 
a meeting, and working conditions of videoconference processes on machines, to 
maintain the comfortable conference sessions. Usually these tasks burden novice 
users. To reduce these various kinds of loads on users of desktop VCSs, we have 
been developing Flexible Videoconference System (FVCS) [5]- [7], which is a user 
support environment for videoconferencing, as a multiagent system. By adding 
some flexible features provided by agents [8] [9] to traditional VCSs, FVCS can 
change its functions and performances autonomously in accordance with changes 
of user requirements and system/network environments. 

In the research area of adaptive QoS control based on the situations of envi- 
ronments, VCSs which can control its outgoing data rate considering congestion 
condition of network have been developed [2]. In such systems, QoS control 
algorithms are decided to fit to a very restricted situation. This makes their 
problem solving capability static, thus, flexible behavior considering many kinds 
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of parameters such as importance or emergence of given problems, is difficult to 
achieve. 

To overcome this limitation, we propose an agent architecture with strategy- 
centric adaptive QoS control mechanism, in this paper. Using this mechanism, 
we can deal with the problems occurred on QoS by changing the QoS control 
strategy dynamically based on the characteristics of the problems, status of 
problem solving processes, user requirements, and so forth, in flexible manner. 
We also show some experimental results from a prototype of FVCS which is 
improved by the proposed architecture. 

In section 2, we explain the basic concept of FVCS. Section 3 then presents 
the model of strategy-centric adaptive QoS control and its architecture. The 
application of proposed model to VCS is also discussed. Finally, we illustrate 
the details of implementation and evaluate the results of experiments using the 
prototype system. 



2 Flexible Videoconference System Project 



Flexible Videoconference System (FVCS) [5]- [7] Project has been promoted aim- 
ing at providing a user-centered videoconference environment based on the con- 
cept of Flexible Networks [8]. The primary objective of the project is to reduce 
lots of users’ overloads in utilizing the VCSs, by effective use of traditional VCS 
software and expertise of designers/operators of the VCSs. In addition, we have 
another aspect of the project concerning the methodology of system construc- 
tion, that is, we are developing FVCS as a test bed application of our agent-based 
computing infrastructure, ADIPS framework. 

To lighten users’ burdens of VCSs, flexibility of FVCS is attained by putting 
the following functionality to traditional existing VCSs, i.e., (FI) Service config- 
uration function at the start of a session: this function composes the most suit- 
able service configuration of VCS automatically by selecting the best software 
modules and deciding their parameters under the given conditions of environ- 
ments and users at the start up of a videoconference session, (F2) Service tuning 
function during the session: this function adjusts the QoS provided by FVCS 
autonomously based on the changes of network/computational environments or 
user requirements against the QoS. This function is realized by two phase tun- 
ings, i.e., parameter operation tuning for small-scale changes and reconfiguration 
of videoconference service for large-scale changes. 

FVCS has following characteristic aspects, that is, (1) performing application 
level adaptive QoS control, (2) utilizing existing VCS implementations [l]-[4] ef- 
fectively, (3) controlling QoS parameters considering not only the network status 
but also load conditions of computer platforms and user requirements, and (4) 
constructed as a multiagent system based on ADIPS (Agent-based Distributed 
Processing System) framework [10] [11], the agent-oriented computing infrastruc- 
ture, developed by authors. 
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Agent 



Fig. 1. Architecture of Flexible Videoconference System 



Fig. 1 depicts the agent-based architecture of FVCS. Agents, i.e., intelligent 
software modules in VCS, and their cooperative behavior in accordance with 
inter- agent communication protocols realize two functions described above [5]. 

The function (FI) is realized by interactions of User agents, Sensor agents 
and Agent Repository. In the Agent Repository, a lot of class agents are waiting 
for requests from agents outside of the repository. When a task announcement 
is issued to the repository, class agents are activated and most suitable agents 
are selected by the contract-net based negotiation. The selected agents are in- 
stantiated, and configure organizations of Service agents. 

In this paper, we concentrate on the parameter operation tuning of service 
tuning function (F2) as adaptive QoS control. (F2) is achieved by mainly video- 
Conf-Manager (VCM) agents in Fig. 1. Each VCM agent maintains the video- 
conference services provided to one specific user. VCM agents exchange lots of 
data with User agents. Sensor agents, and Service agents frequently, and decide 
action sequence of QoS control onto videoconference process agents; video, au- 
dio, whiteboard agents. Moreover high level of negotiation between VCM agents 
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is required to optimize the load balance of both platforms of user A and B, and 
fulfill the requirements of both users. 



3 Strategy-Centric Adaptive QoS Control 

3.1 Related Works 

Some kinds of researches on application level QoS control are undertaken such 
as I VS [2] and framework-based approach [12]. I VS was developed aimed at the 
videoconferencing over Internet. IVS adjusts its outgoing data transmission rate 
by controlling the parameters of video coders based on feedback information 
about changing network conditions. It can also accept very simple user require- 
ments by specifying policy on QoS control. Though, because its control mecha- 
nism is static, it can not change control strategy based on variety of changes of 
environments and user requirements. This static property makes limitations of 
QoS control (PI). Moreover optimization of the load balance considering status 
of both participants’ sites is not supported (P2). 

While framework-based approach provides the skeleton to address two fun- 
damental challenges for the construction of ’’network aware” applications: 1) 
how to find out about dynamic changes in network service quality and 2) how 
to map application-centric quality measures to network-centric ones. This ap- 
proach takes care of balancing of both platforms, but control mechanism is fixed 
to a specific manner as well as IVS. Furthermore user requirements, i.e. ’’user 
awareness” in their words, are not taken into consideration enough. 

In addition to these limitations, in these two systems, QoS control algorithms 
are hard coded in the system, so it is difficult to reuse its advanced features on 
other VCSs (P3). 

3.2 Strategy-Centric Adaptive QoS Control with M-INTER Model 

To overcome these limitations of traditional QoS control mechanisms described 
in section 3.1 (P1-P3), we propose a strategy- centric adaptive QoS control mech- 
anism. This mechanism is embedded to VCM agents in FVCS and achieves the 
service tuning function (F2). 

The strategy- centric adaptive QoS control mechanism is designed along with 
the following policies, i.e., (1) Introducing meta level knowledge representing 
characteristics of occurred problems and current status during the problem solv- 
ing process, (2) Giving a framework to incorporate multiple QoS control strate- 
gies, (3) Building a knowledge processing mechanism to switch the QoS control 
strategies based on the knowledge represented by (1), and (4) Constructing these 
mechanisms in the form of software module to encourage re-usability and main- 
tenancibility of agent design and implementation. 

We propose Meta-Interaction model (M-INTER model) as a new architecture 
of knowledge in agents to accomplish the strategy- centric adaptive QoS control. 

Fig. 2 represents the concept of M-INTER model. 
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Fig. 2. Conceptual scheme of M-INTER model 



With this model, knowledge processing is performed in two different modes 
in agents, namely Strategy Selection Mode and Domain Cooperation Mode. 



(I) Strategy Selection Mode: In this mode, agents monitor the meta-level 
conditions of cooperative behavior for themselves such as a class of given 
problem, level of improvement during problem solving process, deadline, and 
so forth. With this information, agents select the most adequate strategy by 
using Strategy Selection Knowledge. This selection is done by negotiation of 
agents using Strategy Control Protocol. 

(II) Domain Cooperation Mode: Based on the strategy selected by Strat- 
egy Selection Mode, strategy- centric cooperation is performed using Prob- 
lem Domain Knowledge and Problem Domain-oriented Protocol (DoP) in 
Domain Cooperation Mode. DoP is a protocol specialized to a very limited 
problem class. Problem Domain Knowledge is prepared to fit to a specific 
problem domain exclusively. 

The strategy-centric QoS control is realized by transition of these two modes 
to and fro. When a problem which needs agents’ cooperation occurs, firstly, 
agents begin to negotiate to decide the most proper strategy on the given con- 
ditions in Strategy Selection Mode. After a selection made, the agent transits to 
Domain Cooperation Mode, and begins to perform the problem domain-oriented 
cooperation using specified Problem Domain Knowledge and DoP. When some 
kinds of events which refiect to cooperative problem solving occurred such as 
change of class of given problem or coming up of deadline, agent transits to the 
Strategy Selection Mode again. These transitions are executed repeatedly. 

3.3 M-INTER Architecture 

Fig. 3 shows an architecture of agent based on M-INTER model. 
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Fig. 3. M-INTER architecture 



(1) INTER Protocol Handler: Simple message handling module to cope 
with inter-agent communication messages. The messages are driven by 
INTER Protocol, the primary protocol used by cooperation between agents. 
Table 1 represents performatives, that is, means of communication primi- 
tives, of this protocol. In the table, ”S” stands for a sender of a message, 
while ”R” stands for a recipient of a message, respectively. 

(2) Problem Domain-oriented Protocol Machine (DoP Machine): Pro- 
tocol handling module to achieve the problem domain-oriented cooperation 
in Domain Cooperation Mode. A DoP Machine consists of a DoP Handler 
and several Knowledge Activators. DoP Handler is a simple parser of DoP, 
while Knowledge Activator decides actions of an agent based on the static 
knowledge. One Knowledge Activator is activated during cooperation. 

(3) Cooperation Strategy Controller: Strategy control module activated in 
Strategy Selection Mode. This module is charged with selection of DoP Ma- 
chine and Knowledge Activator, negotiating with other agents using Strategy 
Control Protocol (Table 2). 

(4) Static Knowledge Base: Container of expert knowledge that is used by 
Cooperation Strategy Controller and Knowledge Activators. 

By applying the architecture described above, we have three advantages that 
overcome the limitations of traditional QoS control explained in section 3.1, i.e.; 
1) It can change cooperation strategies flexibly by switching DoP Machines and 
Knowledge Activators. This mechanism can provide wide dynamic range against 
the changes on environments and user requirements (solution to PI stated in 
section 3.1). 2) Describing specialized protocols and knowledge for an ad hoc 
problem in DoP Machine, high level cooperation, such as sophisticated opti- 
mization of the load balance considering status of both sites and both users. 
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Table 1. Performatives of INTER Protocol 



Performative 


Summary 


RequestAction 


S requests R to do something 


Acceptance 


S accepts the RequestAction 


Refusal 


S refuses the RequestAction 


Request Informat ion 


S requests some information to R 


Information 


S sends some information to R replying Requestinformation 


Report 


S sends some information to R 



Table 2. Performatives of Strategy Control Protocol 



Performative 

Request-Make- Coop 

Acceptance- Make- Coop 

Refusal- Make- Coop 

Request-Close-Coop 

Request-Change-Protocol 

Acceptance-Change-Protocol 

Refusal-Change-Protocol 

Request- C hange- Coop- S t at us 

Acceptance-Change-Coop-Status 

Refuse- C hange- C o op- St at us 



Summary 

S requests R to start cooperation 
S accepts a request from R to start cooperation 
S refuses a request from R to start cooperation 
S requests R to terminate cooperation 
S requests R to change protocol 
S accepts a request from R to change protocol 
S refuses a request from R to change protocol 
S requests R to change cooperation status 
S accepts a request from R to change cooperation 
status 

S refuses a request from R to change cooperation 
status 



can be attained (solution to P2). 3) Constructing knowledge and controllers in 
module, effective reuse and improvement of readability are achieved (solution to 
P3). 

3.4 Applying M-INTER Model to FVCS 

To apply M-INTER model to EVCS, we have defined four types of DoP Ma- 
chines. 

(1) Basic Protocol Machine: A simple protocol machine to control QoS of 
both sites. Using this protocol, VCM agents direct videoconference processes 
rotatably to increase/decrease values of QoS parameters in a fixed range. 
There are 5 kinds of Knowledge Activators, which have each range of change 
respectively. 

(2) Compromise Level Protocol Machine: A deliberative type protocol 
machine to adjust QoS by trial and error strategy. With this protocol, VCM 
agents have each mental state on limitations of degradation of QoS param- 
eters, namely compromise level [13]. VCM agents perform to find the com- 
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promise point each other, changing their compromise level dynamically. This 
strategy is rather costly, but it can achieve QoS tuning more precisely. 

(3) Time Restricted Protocol Machine: A protocol machine that can co- 
operate considering time restrictions such as deadline. 

(4) Reactive Protocol Machine: A reactive type protocol machine to reduce 
communication overhead between agents. Although the accuracy of parame- 
ter tuning is not guaranteed, quick response against the changes is enabled. 
This strategy is used on the unstable environment where resources are ex- 
pected to be changed at very short time interval. It is also used as their last 
card when deadline comes nearby. 

The behavior of agents based on M-INTER model against the change of CPU 
resource is illustrated in Fig. 4. 

(1) Detection of resource degradation: CPUMonitor-A agent detects deviation of 
CPU resource from acceptable range, and then reports to video- Conf-Manager- 
A (VCM-A) agent with Report message. 

(2) Selection of initial strategy: Cooperation Strategy Controller in VCM-A se- 
lects the Compromise Level Protocol Machine because there is temporal al- 
lowance to deadline. Firstly, ’’Compromise Level 1” Knowledge Activator in 
Compromise Level Protocol Machine of VCM-A is activated and tries to adjust 
parameters of video-A agent within its compromise level. 

(3) Cooperative QoS control with Compromise Level Protocol Machine: If VCM- 
A can not release the resource, it requires the collaboration to VCM-B by is- 
suing Request- Make- Coop message to VCM-B to make cooperation relation. 
The ’’Compromise Level 1” Knowledge Activator of VCM-B is activated and 
tries to adjust parameters of video-B agent within its compromise level as well. 
When VCM-B can not release the resource too. Cooperation Strategy Controller 
switches Knowledge Activator to ” Compromise Level 2” . 

(4) Change of DoP Machine: In case that the specified deadline comes nearby. 
Cooperation Strategy Controller switches DoP Machine to Time Restricted Pro- 
tocol Machine to keep the deadline. With this protocol machine, requirements on 
time constraints are added in a message of DoP, so punctual behavior of agents 
is enabled. 

(5) Termination of cooperative action: When CPU resource is released, cooper- 
ation relation of VCM-A and VCM-B is closed. 

4 Experiments and Evaluation 

4.1 Implementation 

The proposed architecture based on M-INTER model is embedded to the 
VCM agents of FVCS, as described in section 3. We have used ADIPS Frame- 
work/95 [10] [11] as an agent-based computing infrastructure. The proposed ar- 
chitecture of M-INTER model is written in Tcl/Tk programming language[14] 
extending the agent’s knowledge architecture provided by original ADIPS Frame- 
work. 
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Fig. 4. An example of agents’ cooperation with M-INTER architecture 
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4.2 Experiments on Agents’ Behavior 

The FVCS based on M-INTER architecture has been implemented and experi- 
mented under the environment shown in Eig. 5. To evaluate the agent’ flexibility 
provided by our model, we changed the CPU resources forcedly, and monitored 
the system’s behavior. Firstly, some extra load on CPU of WS-B was added ex- 
ternally, and we observed the changes of QoS parameters of video process, i.e., 
frame rate, encoding quality and resolution. At that time, on WS-A in Fig. 5, 
User-A represented his requirements of smoothness in movement of video to 
the highest priority, second highest priority to video quality and lowest prior- 
ity to video resolution. On the other hand, User-B on WS-B represented its 
requirements with the highest priority to video quality, second highest priority 
to smoothness, and lowest priority to resolution. When the problem of CPU 
resources insufficiency was occurred, we provided a limited time to solve the 
problem to the system. The given time limits were 120, 60 and 180 seconds, 
respectively. 

Fig. 6, 7, 8 represent the transition of the parameters’ values controlled by the 
agents. In the graph, x-axis represents the time (second) and y-axis represents 
each parameter values observed at the recipient site. The parameter values are 
expressed in percentage when the following values are regarded as 100%; 

— CPU load: 100% 

— Smoothness in movement: 35-fps 

— Quality: 3 2- level 

— Resolution: 3-level 

In the graph, symbol triangle indicates the switching time of DoP machines or 
Knowledge Activators (KAs). ’B-1’, ’R’ etc. represent the types of DoP machines 
and KAs used in the time slot. For an example, ’B- 1’ indicates that the KA-1 of 
Basic Protocol Machine is used. ’R’ shows that the Reactive Protocol Machine 
is used. Basically, when the CPU resources are found insufficient (maybe CPU 
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(b) Change of QoS at User-B 



Fig. 6. Behavior of FVCS against CPU variation l:(time limit 120s) 



load of WS-A or WS-B increases), the agents aim to maintain the stability of 
the system by considering the user requirements. In this case, it might reduce 
the QoS and consequently release some CPU resources. 

Fig. 6 represents the transition of the parameters’ values controlled by the 
agents when the given time limitation is 120 seconds. When the CPU load of WS- 
B increased (at point A or after 40 seconds), agents of FVCS began cooperative 
actions. At first, the KA-2 of the Basic Protocol Machine (B-2) was selected. 
There exist five types of KAs in Basic Protocol Machine. If this numerical value 
of this KA becomes big, the slope becomes sharper. 

In the area of ’B-2’ of Fig. 6(a), the resolution of video provided to User-A 
at WS-A was reduced at point B-C according to user priority. Secondly, the 
video quality was reduced at point D-E. While the resolution of video provided 
to User-B at WS-B was reduced at point J-K in ’B-2’ shown in Fig. 6(b). In the 
next instance, smoothness was reduced at point L-M. In this stage, the Cooper- 
ation Strategy Controller starts activating. It calculates the remaining time and 
the degree of problem solution (in this case, the release level of CPU resource). 
By considering these results it selected the KA-4 without changing the protocol 
machine. As a result, the parameter value had a sharp declination between (M- 
N) points. When the remaining time became very small (at the warning stage), 
the Cooperation Strategy Controller started activating again and changed its 
protocol from Basic to Reactive Protocol Machine, ’R’. During a Reactive Pro- 
tocol session, only the highest priority QoS parameter remains unchanged, but 
other QoS parameters are decreased to the minimum, without considering any 
further conditions. Therefore, CPU resources were released (points H-I, N-0). In 
this given time limit (120 seconds), the agents tried different strategies to satisfy 
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Fig. 7. Behavior of FVCS against CPU variation 2: (time limit 60s) 



the user requirements as much as possible and finally the system succeeded in 
releasing the CPU resources. 

Fig. 7 represents the transition of the parameters’ values controlled by the 
agents when the given limited time is 60 seconds. When the CPU load increased 
(at point P or after 30 seconds), agents of FVCS began cooperative actions. In 
this case, since the given time limitation (60 seconds) was shorter than the pre- 
vious experiment, the Cooperation Strategy Controller selected the KA-5 of the 
Basic Protocol Machine (B-5) earlier. From the beginning, the parameter dec- 
lination is considerably sharp as shown in Fig. 7 (points Q-R, S-T, W-X, Y-Z). 
Furthermore, when the remaining time came to its end, the system changed its 
protocol from Basic to Reactive Protocol Machine, ’R’ and CPU resources were 
released (points U-V, a-b). In this case, though the time limit was set up only 
for 60 seconds. The result shows Fig. 7 that the system could somehow release 
the CPU resources within the given time limit, though it had rapid parameter 
controls. 

Fig. 8 is the case when the time limit was set up for 180 seconds. The extra 
CPU load was injected (at point c or after 40 seconds), and the agents of FVCS 
began cooperative actions at this point. 

In this case, as the given time (180 seconds) was longer than the previous time 
limit (120 seconds), from the beginning, the Cooperation Strategy Controller 
selected the KA-2 of the Basic Protocol Machine (B-2). Since the system had 
enough time in this case, it selected the KAs B-3, B-4 and B-5 step by step 
without making any hurry to select any reactive protocols. Between 140 seconds 
to 210 seconds, the WS-B side selected the KA-4(B-4), whilst the WS-A side 
selected KA-5 (B-5). It means that WS-A selected a powerful activator. The 
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Fig. 8. Behavior of FVCS against CPU variation 3: (time limit 180s) 



reason is simple: because the degree of problem solvency was different. As we 
see from the experimental results, the agents act flexibly to satisfy the user 
requirements as much as it can and finally the system succeeds in releasing the 
CPU resources. 



4.3 Discussion 

The existing QoS control mechanisms such as congestion control scheme in I VS 
can perform the adaptive behavior against the changeable environment as well 
as our system. I VS adjusts its outgoing data transmission rate by controlling 
the parameters of video coders based on feedback information about changing 
network conditions. But it does not consider the other dynamic properties such 
as computational resources and user requirements. Moreover, because its control 
mechanism is static, it doesn’t support switching scheme of control strategy 
considering problem’s class and status of problem solving process. Thus, for 
instance, when a time limit for recovering QoS is given or the accurate parameter 
setting is required to the system, it has difficulties to make cooperative actions 
in a flexible manner. We can conclude from our experimental results, that the 
proposed system is capable to solve such problems as when a time limit is given. 
By considering the dynamic property of the system, it switches its strategies by 
selecting a proper protocol machine and Knowledge Activator during a session. 

The flexibility of the system is achieved because the Cooperative Strategy 
Controller monitors the dynamic changes during the cooperation of the VCM 
agents. By considering this phenomenon, a knowledge base is introduced to make 
proper switching to select a DoP machine or a suitable Knowledge Activator. 
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5 Conclusion 

In this paper, we have proposed a new architecture called M-INTER model to 
deal with the strategy-centric adaptive QoS control. This model extends the 
functions of the QoS control mechanism by sophisticated cooperation among 
agents in Flexible Videoconference System (FVCS). The proposed architecture 
analyzes the property of the problem occurred on QoS, considers every step 
during a session, changes the strategies dynamically and solves the problem 
even more flexibly. 

We have implemented the proposed model and have carried out experiments 
by applying it to FVCS. The experimental results clearly show that the flexi- 
bility is improved. The future works of this system include improvement of the 
efficiency and intelligence of the Cooperative Strategy Controller. 
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